Todor Markov Todor Markov - 6 months ago 122
Scala Question

Scala: variadic UDF

I have a DataFrame with a many columns.
I also have a function

def getFeatureVector(features:Array[String]) : Vector

that is fairly complex, but takes some strings and returns a spark MLlib vector.

Now, I want to look at some columns in the DF (I don't know which beforehand), pass them to getFeatureVector, and add a new column containing the resulting vectors.

I have access to an array of the columns I want to use, and I wrote a function that casts it to string, and makes an array column:

val colNamesToEncode = Array("col1", "col2", "col3", "col4")
def getColsToEncode:Column = {
val cols = => col(x).cast("string"))

Finally, I try to make a udf and apply it to the DF:

val encoderUDF = udf(getFeatureVector _)
val cols = getColsToEncode()

but when I run that, I get java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit ()

How can I apply to function to the DF?

PS: I was using this answer (Spark UDF with varargs) as a guide while writing my code.


Just remove () from the below line, that resolved the error.

From val cols = getColsToEncode()


val cols = getColsToEncode