Finkelson Finkelson - 2 months ago 14x
JSON Question

Spark UDF returns a length of field instead of length of value

Consider the code below

object SparkUDFApp {
def main(args: Array[String]) {

val df =".../example.json")

val fn = (_: String).length // % 10
ctx.udf.register("len10", fn)

val res0 = ctx sql "SELECT len10('id') FROM example LIMIT 1" map {_ getInt 0} collect


JSON example


The code should return a length of the field value from JSON (e.g. 'id' has value 18). But instead of returning '18' it returns '2', which is the length of 'id' I suppose.

So my question is how to rewrite UDF to fix it?


The problem is that you are passing the string id as a literal to your UDF so it is interpreted as one instead of a column (notice that it has 2 letters this is why it returns such number). To solve this just change the way how you formulate the SQL query.


val res0 = ctx sql "SELECT len10(id) FROM example LIMIT 1" map {_ getInt 0} collect

// Or alternatively
val len10 = udf(word => word.length)"id")).as("length")).show()