COLD ICE COLD ICE - 3 years ago 147
Scala Question

Storing each element from each RDD to a new List

I am trying to store each element from each rdd into a new list. I can print the elements but I could not store elements in list or even having a string variable.

The is the code below:

...
var hashtags = joined_d.map(x => ((x._1, x._2._1._1, x._2._2,
x._2._1._4),
getHashTags(x._2._1._4))).
transform(rdd => rdd.map{case (x, list) => if(list.length > 0)
list.map(k => (k, (x._1, x._2, x._3, x._4, 1)))
else List((x._1.toString, (x._1, x._2, x._3, x._4, 0))) })


Now when storing the elements like:

val arr = new ArrayBuffer[String]();
var hashtags_pair = hashtags.foreachRDD(rdd =>
rdd.foreach(l => l.foreach(x => arr += x._1)))


Then printing the values out:

arr.foreach(println) // Not working


But when printing the values straight without storing it like:

var hashtags_pair = hashtags.foreachRDD(rdd =>
rdd.foreach(l => l.foreach(x => println(x._1))) // It's working

Answer Source

No you can't store the output of a map in an array. The reason is the RDD is a distributed dataset and it executes the map operation in different executors in parallel. Now the driver sends only the closure of the map operation to the executors for execution.

Here the declared array variable is a local to the driver and it can't be send to all the executors.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download