Rãã Møó Rãã Møó - 2 months ago 19
Scala Question

Split and choose in scala



I found some explanation to do this but i still can't do it !!

I want to split

val data=sc.textFile("hdfs://ncdc/isd-history.csv")


the
data
have the form :
("949999","00338","PORTLAND (CASHMORE)","AS","","","-38.320","+141.480","+0081.0","19690724","19781113")


I want to split data and take only the 1st
(949999)
and the 3rd
(PORTLAND (CASHMORE))


I have done this ,

val RDD = (data.filter(s => (s.split(',')(0) , s.split(',')(2))))


But,it doesn't work :)

Thank you.

Answer

RDD.filter filters records, not "columns" - it expects a function from the record type (String, I assume, in this case) to Boolean, and would filter out all records for which this function returned false.

You're trying to transform each record from a String into a tuple (while "filtering" out parts of that string), so you should use RDD.map instead of RDD.filter:

val RDD = data.map(s => (s.split(',')(0), s.split(',')(2)))

Or better yet:

val RDD = data.map(_.split(',')).map(arr => (arr(0), arr(2)))
Comments