gsjunior86 gsjunior86 - 15 days ago 5
Scala Question

How to correctly sum an integer inside .map function in Spark?

I'm new to Scala and Spark, trying to create a pair-like RDD in Spark, assigning an Int as key for each line and summing +1.

val mapUrls = urls.map{
var cont = 0
x =>
cont += 1
(cont,x)
}


the problem is that somehow, the cont variable goes back to 1 after certain time.

What am i doing wrong?

Answer

Is this is what you want?

urls.zipWithIndex.map(_.swap)

Your code just cannot work correctly. Remember that Spark is distributed framework and there is no shared memory. Each task has updates its own cont.