Maroof Maroof - 1 year ago 56
Scala Question

SPARK Task not serializable due to assert statement

I am getting "Task not serializable" error due to assert statement in foreach method on an RDD. Is there any work around to write assert for every element of RDD ?

class myTest extends Funsuite {

//some code to create spark context(sc)

var arrRDD = sc.parallelize(Array(1,1,1,1,1))

test("custom test"){
arrRDD.foreach{
x => {
//commenting out this assert removes the error
assert(x == 1)
}
}
}

}

Answer Source

RDD (Resilient Distributed Dataset) is a collection which is distributed over a nodes in a cluster, When we work, we just see as a collection in single machine which is due to abstraction.

When you run RDD.map or any other transformation like map, filter etc this is serialized and moved to other nodes on cluster and executes on these nodes.

The error in your"Task not serializable" is due to the transformation arrRDD.foreach which is serialized but the method inside it "assert" is not serialized, So it cannot be moved to the other nodes.

If you care trying to assert the values you can just collect it, which brings the data to driver node as an array an assert it as

arrRDD.collect().foreach{
  x => assert (x == 1)
}

But I don't think it is still a good way!

Hope this helped you :)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download