sarthak sarthak - 1 month ago 9
Scala Question

Case statements in Spark

I'm writing a Spark code where I need to convert an RDD of type

(String,(String,String))
to
((String,String),String
).

I have the following input text file:

Language,Language-code,TotalViewsInThatLang
English,en,10965376,"Main_Page",2938355
Russian,ru,1925718,"%D0%97%D0%B0%D0%B3%D0%BB,915495
Spanish,es,1010810,"Wikipedia:Portada",13603


I have created an RDD as follows:

val line = sc.textFile(inputFile)
val nrdd = line.map(x=>(x.split(",")(0),(x.split(",")(1),x.split(",")(2))))
nrdd: org.apache.spark.rdd.RDD[(String, (String, String))] = MapPartitionsRDD[2] at map at <console>:26


From this I want to use
case
function to create RDD of type
((String,String),String)
.

How can I do this with
case
statements in
map
?

EDIT

I am getting the following error when I am trying to use case function:

scala> val frdd = nrdd.map( {case(x,(y,z))=>((x,y),z))})
<console>:1: error: ';' expected but ')' found.
val frdd = nrdd.map({case(x,(y,z))=>((x,y),z))})
^

Answer

Unless I misunderstood your question, you want this:

val list: List[((String, String), String)] = List((("a1", "b1"), "c1"), (("a2", "b2"), "c2"))
val res = list.map { case ((a, b), c) => (a, (b, c)) }

println(res) // List((a1,(b1,c1)), (a2,(b2,c2)))