M.Rez M.Rez - 2 months ago 21
Scala Question

scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double (Spark)

I have a DataFrame like this:

root
|-- midx: double (nullable = true)
|-- future: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: long (nullable = false)
| | |-- _2: long (nullable = false)


Using this code I am trying to transfer it into something like this:

val T = withFfutures.where($"midx" === 47.0).select("midx","future").collect().map((row: Row) =>
Row {
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx"), e, f)
}
}
).toList

root
|-- id: double (nullable = true)
|-- event: long (nullable = true)
|-- future: long (nullable = true)


So the plan is to transfer the array of (event, future) into a dataframe that has those two fields as a column. I am trying to transfer T into a DataFrame like this:

val schema = StructType(Seq(
StructField("id", DoubleType, nullable = true)
, StructField("event", LongType, nullable = true)
, StructField("future", LongType, nullable = true)
))

val df = sqlContext.createDataFrame(context.parallelize(T), schema)


But when I a, trying to look into the
df
I get this error:

java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double

Answer

After a while I found what was the problem: First and foremost that Array of structs in the column should be casted to Row. So the final code to build the final data frame should look like this:

val T = withFfutures.select("midx","future").collect().flatMap( (row: Row) =>
    row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
      (row.getAs[Double]("midx") , e, f)
    }.toList
).toList

val all = context.parallelize(T).toDF("id","event","future")