Amardeep Singh Amardeep Singh - 1 month ago 17
Scala Question

Apache Spark using scala output file containing parenthesis

I am joining two RDD's and using the following code. The output file contains parenthesis in between keys and values, I understand that it is because I am simply dumping the key, value pair to the output file. I am new to Scala and Spark. Please help, I need the output without the parenthesis.

The logic is that I have got two classes Data1, Data2 which contain the values of the input data and I join both of them using the key(finalKey)

val input1=sc.textFile("file:///home//cloudera//Documents//flightdelays/flight_delays1.csv")
val input2=sc.textFile("file:///home//cloudera//Documents//weather/sfo_weather.csv")

val data1 = input1.map{
Test12.flightDelays.Data1.apply

}
val data11=data1.filter(s=>s.airport1.toString().toUpperCase().equals("SFO")).map(s=>(s.finalKey,s.airport1))
val data2 = input2.map{
Test12.flightDelays.Data2.apply

}
val data22=data2.map(s=>(s.finalKey,s.max+","+s.min))

val data33=data11.join(data22,1)

data33.saveAsTextFile("file:///home//cloudera//Documents//11111.txt")

}

}


This is the output I am getting :

(20080103,(SFO,150,94))
(20080103,(SFO,150,94))
(20080103,(SFO,150,94))
(20080103,(SFO,150,94))
(20080103,(SFO,150,94))
(20080103,(SFO,150,94))
(20080103,(SFO,150,94))

Answer

To map a key-value pair into a comma-separated string:

data33.map { case (key, value) => s"$key,$value" }.saveAsTextFile(...)

To map any n-Tuple:

data33.map(_.productIterator.mkString(",")).saveAsTextFile(...)

I can't figure out exactly what the type of data33 is, but obviously the first form (which uses Pattern Matching) can be expanded to any "hierarchy" of tuples or case classes (Products):

data33.map { case (a, (b, (c, d), e)) => s"$a,$b,$c,$d,$e" }.saveAsTextFile(...)