Todor Markov Todor Markov - 1 month ago 37
Scala Question

Scala: variadic UDF

I have a DataFrame with a many columns.
I also have a function

def getFeatureVector(features:Array[String]) : Vector


that is fairly complex, but takes some strings and returns a spark MLlib vector.

Now, I want to look at some columns in the DF (I don't know which beforehand), pass them to getFeatureVector, and add a new column containing the resulting vectors.

I have access to an array of the columns I want to use, and I wrote a function that casts it to string, and makes an array column:

val colNamesToEncode = Array("col1", "col2", "col3", "col4")
def getColsToEncode:Column = {
val cols = colNamesToEncode.map(x => col(x).cast("string"))
array(cols:_*)
}


Finally, I try to make a udf and apply it to the DF:

val encoderUDF = udf(getFeatureVector _)
val cols = getColsToEncode()
data.withColumn(featuresColName,encoderUDF(cols))


but when I run that, I get java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit ()

How can I apply to function to the DF?

PS: I was using this answer (Spark UDF with varargs) as a guide while writing my code.

Answer

Just remove () from the below line, that resolved the error.

From val cols = getColsToEncode()

To

val cols = getColsToEncode