Guforu Guforu - 1 month ago 6
Scala Question

udf Fuction for DataType casting, Scala

I have next DataFrame:

df.show()

+---------------+----+
| x| num|
+---------------+----+
|[0.1, 0.2, 0.3]| 0|
|[0.3, 0.1, 0.1]| 1|
|[0.2, 0.1, 0.2]| 2|
+---------------+----+


This DataFrame has follow Datatypes of columns:

df.printSchema
root
|-- x: array (nullable = true)
| |-- element: double (containsNull = true)
|-- num: long (nullable = true)


I try to convert currently the DoubleArray inside of DataFrame to the FloatArray. I do it with the next statement of udf:

val toFloat = udf[(val line: Seq[Double]) => line.map(_.toFloat)]
val test = df.withColumn("testX", toFloat(df("x")))


This code is currently not working. Can anybody share with me the solution how to change the array Type inseide of DataFrame?

What I want is:

df.printSchema
root
|-- x: array (nullable = true)
| |-- element: float (containsNull = true)
|-- num: long (nullable = true)


This question is based on the question How tho change the simple DataType in Spark SQL's DataFrame

Answer

Your udf is wrongly declared. You should write it as follows :

val toFloat = udf((line: Seq[Double]) => line.map(_.toFloat))
Comments