user3209815 user3209815 - 2 months ago 18
Java Question

map function is never executed

I wrote a method that takes a BlockMatrix and sets all values that are not 0 to 1.

public BlockMatrix SetNonZeroesToOnes(BlockMatrix matrix)
// initialize
JavaRDD<MatrixEntry> matrixEntries = matrix.toCoordinateMatrix().entries().toJavaRDD();

// transformation ->
if(t.value() != 0)
return new MatrixEntry(t.i(), t.j(), 1);

return new MatrixEntry(t.i(), t.j(), 0);

// action
List<MatrixEntry> list = matrixEntries.collect();
for (MatrixEntry matrixEntry : list)
System.out.println("(" + matrixEntry.i() + ", " + matrixEntry.j() + ") = " + matrixEntry.value());

// return value
CoordinateMatrix coordMat = new CoordinateMatrix(matrixEntries.rdd(), matrix.numRows(), matrix.numCols());
return coordMat.toBlockMatrix();

The problem is that the map function is never executed. I haven't gotten yet to integrating the method with my code, but for now I'm just running JUnit tests on it. The test setup is rather simple, a BlockMatrix is generated from data that is parallelized by a local spark context and fed to the method.

I know about the lazy execution that is native to Spark, that is why I added the collect method, as an action ought to trigger the execution of previous transformations. Note, that it shouldn't be there in the final version, as other methods will perform actions on the data set.

I've even added trace logs in the map section and they are never logged, the debugger won't step into it and of course the functionality is not executed.

So, the question is, what am I missing here? Why is this map -> collect call different from other similar ones?


You are ignoring result of the call, so Spark not even try to build a result. If you do not need a reference to the original matrix, you should write matrixEntries =