Feynman27 Feynman27 - 1 year ago 238
Scala Question

Concatenate String to each element of a List in a Spark dataframe with Scala

I have two columns in a Spark dataframe: one is a String, and the other is a List of Strings. How do I create a new column that is the concatenation of the String in column one with each element of the list in column 2, resulting in another list in column 3.

For example, if column 1 is "a", and column 2 is ["A","B"], I'd like the output in column 3 of the dataframe to to be ["aA","aB"].

So far, I have:

val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
val multiplierUDF = udf(multiplier)
val df2 = df1
.withColumn("col3", multiplierUDF(df1("col1"),df1("col2")))

which gives

Answer Source

I suggest you try your udf functions outside of spark, and get them working for local variables first. If you do:

val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
multiplier("a", Seq("A", "B"))

// output
res1: String = aList(A, B)

You'll see multiplier doesn't do what you want.

I think you're looking for:

val multiplier = (x1: String, x2: Seq[String]) => x2.map(x1+_)
multiplier("a", Seq("A", "B"))

res2: Seq[String] = List(aA, aB)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download