satyambansal117 satyambansal117 - 24 days ago 7
Scala Question

Get elements of type structure of row by name in SPARK SCALA

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract structure values by name?

I am using the below code to extract by name but I am facing problem on how to read the struct value .

If values had been of type string then we could have done this:

val resultDF=joinedDF.rdd.map{row=>
val id=row.getAs[Long]("id")
val values=row.getAs[String]("slotSize")
val feilds=row.getAs[String](values)
(id,values,feilds)
}.toDF("id","values","feilds")


But in my case values has the below schema

v1: struct (nullable = true)
| |-- level1: string (nullable = true)
| |-- level2: string (nullable = true)
| |-- level3: string (nullable = true)
| |-- level4: string (nullable = true)
| |-- level5: string (nullable = true)


What shall I replace this line with to make the code work given that value has the above structure.

row.getAs[String](values)

Answer

You can access the struct elements my first extracting another Row (structs are modeled as another Row in spark) from the toplevel Row like this:

val level1 = row.getAs[Row]("struct").getAs[String]("level1")
Comments