J.Done J.Done - 1 month ago 15
Scala Question

Scala - Keep Map in foreach

var myMap:Map[String, Int] = Map()
myRDD.foreach { data =>
println( "1. " + data.name + " : " + data.time)
myMap += ( data.name -> data.time)
println( "2. " + myMap)
}
println( "Total Map : " + myMap)


Result



  1. A : 1

  2. Map(A -> 1)

  3. B: 2

  4. Map(B -> 2) // deleted key A

  5. C: 3

  6. Map(C -> 3) // deleted Key A and B



Total Map : Map() // nothing


Somehow I cannot store Map data in foreach. It kept deleting or initialing previous data when adding new key&value.
Any Idea of this?

Answer

Spark closures are serialized and executed in a separate context (remotely when in a cluster). myMap variable will not be updated locally.

To get the data from the RDD as a map, there's a built-in operation:

val myMap = rdd.collectAsMap()