deadlock89 deadlock89 - 5 months ago 37
Scala Question

How to merge Arrays in RDD

I'm newbie in Spark. I have the following RDD[Array[(String, String, String)]]

val r1 = sc.parallelize(Array(Array(("123","456","789"),("AAA","BBB","CCC")),Array(("DDD","EEE","FFF"),("E1","E2","E3"))))

I want to merge Arrays in it like

Array((123,456,789), (AAA,BBB,CCC), (DDD,EEE,FFF), (E1,E2,E3))

I can do this with
r1.reduce(_ ++ _)
. However, I want to use Transformations functions like map, not Actions ones. Is it possible to do that? I'm using Spark 1.3.1.

Thank you


You can do:

val res: RDD[(String, String, String)] = r1.flatMap(identity)