user1189851 user1189851 - 1 month ago 8
Scala Question

Spark map with conditions / patternmatching

I have an rdd from reading a csv

val sampleRDD = sc.textFile(path)


The csv looks like this:

col1 col2 col3 col4
--------------------
val1 val2 val3 val4
val5 val6 val7 val8
val9 val10 val3 val12
val13 val14 val15 val16
val17 val18 val3 val20
val21 val22 val7 val24


For col3 I have multiple values that repeat. I have a mapping. For
val3
I want the output value to be
A
, for
val7
I want the output value to be
B
. I want the output to look like the one below.

Unfortunately we still have to use spark 1.0.0 and need to work with RDDs.

col1 col2 col3 col4
--------------------
val1 val2 A val4
val5 val6 B val8
val9 val10 A val12
val13 val14 val15 val16
val17 val18 A val20
val21 val22 B val24


How do I go about doing such transformation

Answer

You can simply go by making a UDF and applying on that column : So your UDF should look something like this:

def getValue(s:String)=s match{
case "val13"=>"A"
case "val17"=>"B"
case _=>s
}

Then make a udf out of this function

val valueUdf= udf(getValue _)

And now Apply this UDF to get the new Value output

sampleRDD.withColumns("col3",valueUdf(sampleRDD("col3")))

This will give your desired result!

P.S: The code is not tested but it must work !

Comments