xuhai xuhai - 5 months ago 22
Scala Question

How to add large struct column to dataframe

I want to add a struct column to a dataframe, but the struct has more than 100 fields.

I learned that case class can be changed to struct column, but case class has the limit of no more than 22 fields(online spark is 1.6.3 with scala of 2.10.4).

Can normal class do this? What functions or interface I have to implement?

There is also a "org.apache.spark.sql.functions.struct", but seems that it can't set the name of the fields of the struct.
Thanks ahead.

Answer Source

but seems that it can't set the name of the fields of the struct.

You can. For example:

import org.apache.spark.sql.functions._

spark.range(1).withColumn("foo", 
   struct($"id".alias("x"), lit("foo").alias("y"), struct($"id".alias("bar")))
).printSchema

root
 |-- id: long (nullable = false)
 |-- foo: struct (nullable = false)
 |    |-- x: long (nullable = false)
 |    |-- y: string (nullable = false)
 |    |-- col3: struct (nullable = false)
 |    |    |-- bar: long (nullable = false)