Frankie Frankie - 1 month ago 8
Scala Question

Get class from Object in the run time in scala

import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.type.NumericType
import org.apache.spark.sql.type.BooleanType
....
....
val TableSchema = Array(
("ID", IntegerType),
("Name", StringType),
("TNum", integerType),
("Handled", BooleanType),
("Value", StringType)
)


I have an array of schema information of a table and I am attempting to map it to a struct that can be used in the spark dataframe creation. The array after transformation should be as below:

val struct = Array(
StructField("ID", NumericType),
StructField("Name", BooleanType),
StructField("TNum", NumericType),
StructField("Handled", BooleanType),
StructField("Value", StringType))


So I am trying to write a method that convert each element to a StructField. This is my attempt:

def mapToStruct(arr:Array[(String, String, Object)])={
val newArr = arr.map(ele => StructField(ele._1, ele._2))
newArr
}


In this situation, I cannot get the class of
StringType
,
BooleanType
or
IntegerType
from the third parameter of method mapToStruct. Exception I got is
type mismatch; found : Object required: org.apache.spark.sql.types.DataType
. But if I change the parameter type to Array[(String, String, DataType)], it does not match the variable type.

My question is what datatype I should choose for the third parameter of method mapToStruct and then I can get the class of this object at run time.

thanks in advance.

Answer

This should work:

import org.apache.spark.sql.types.

val tableSchema: Array[(String, DataType)] = Array(
  ("ID", IntegerType),
  ("Name", StringType),
  ("Handled", BooleanType),
  ("Value", StringType)
  )

def mapToStruct(arr: Array[(String, DataType)]): Array[StructField] = arr.map(e => StructField(e._1, e._2))