blunderboy blunderboy - 2 months ago 11
Scala Question

How to check if a column in dataFrame is of StructType

I want to know if a column in DataFrame is of StructType. I have the schema of DataFrame. I am trying to use the following code

df.schema.apply(1) match {
case StringType => // Do Something
case ? => // How to check if 1st column is of StructType
}


For example, Consider this case:

val personStructType =
StructType(
StructField("name", StringType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
StructField("age", IntegerType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
StructField("gender", StringType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
Nil
)

val idStructType =
StructType(
StructField("domain", StringType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
StructField("id", StringType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
Nil
)

val schema =
StructType(
StructField("a", StringType, nullable = true, new MetadataBuilder().putBoolean("isPrimary", true).build) ::
StructField("person", personStructType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
StructField("identifier", idStructType, nullable = true, metadata = new MetadataBuilder().putBoolean("isPrimary", false).build) ::
Nil
)

val a0 = schema.apply(0).dataType
a0 == StringType // Result is true

val a1 = schema.apply(1).dataType
a1 == StructType // Result is false


Because a1 is
StructType(StructField(name,StringType,true), StructField(age,IntegerType,true), StructField(gender,StringType,true))


How do I know if a1 is of StructType ?

Answer

When you write a1 == StructType or case StructType, you are comparing with the value called StructType, which is the companion object of type StructType.

You need to match against the type instead: case struct: StructType (or case StructType(fields)), just like you write case x: String and not case String.