smeeb smeeb - 1 month ago 10
Scala Question

Supporting nested structures with Spark StructType

The javadocs for Spark's

method show that the second argument needs to be a class that extends
DataType
.

I have a situation where I need to add a fairly complicated
MapType
as a field on a
StructType
.

Specifically, this
MapType
field is a map of several nested structures:

Map<String,Map<Integer,Map<String,String>>>


Hence it is a map with 2 nested/inner maps. The inner-most map if of type
Map<String,String>
(so in Spark parlance,
MapType[StringType,StringType]
).

The middle map is of type
Map<Integer,Map<String,String>>
(so again in Spark parlance,
MapType[IntegerType,MapType[StringType,StringType]]
).

How do I specify this complex nested structure of maps when calling the
StructType#add
method?


That is, I want to be able to do something like this:

var myStruct : StructType = new StructType()
myStruct.add("complex-o-map",
MapType[StringType,MapType[IntegerType,MapType[StringType,StringType]]])


However it only looks like I can add the single outer-most
MapType
:

var myStruct : StructType = new StructType()
myStruct.add("complex-o-map", MapType)


This makes me sad. How do I specify my nested map structure during the call to
add(...)
?

Answer

The "types" expected by MapType (.e.g StringTypes, MapType) aren't really types in the Scala sense, they are objects, so you should pass them as constructor arguments and not as type parameters - in other words, use () instead of []:

val myStruct = new StructType().add("complex-o-map",
  MapType(StringType,MapType(IntegerType,MapType(StringType,StringType))))

myStruct.printTreeString()
// prints:
// root
// |-- complex-o-map: map (nullable = true)
// |    |-- key: string
// |    |-- value: map (valueContainsNull = true)
// |    |    |-- key: integer
// |    |    |-- value: map (valueContainsNull = true)
// |    |    |    |-- key: string
// |    |    |    |-- value: string (valueContainsNull = true)
Comments