duckertito duckertito - 27 days ago 11
Scala Question

Creation of RDD[LabeledPoint]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double

I have written the following code in order to convert SQL DataFrame

df
to
RDD[LabeledPoint]
:

val targetInd = df.columns.indexOf("myTarget")
val ignored = List("myTarget")
val featInd = df.columns.diff(ignored).map(df.columns.indexOf(_))

df.printSchema

val dfLP = df.rdd.map(r => LabeledPoint(
r.getDouble(targetInd),
Vectors.dense(featInd.map(r.getDouble(_)).toArray)
))


The schema looks like this:

root
|-- myTarget: long (nullable = true)
|-- var1: long (nullable = true)
|-- var2: double (nullable = true)


When I run
dfLP.foreach(l => l.label)
, then the following error occurs:

java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double


How can I cast the label to double? I expect that other features might be both double or long, isn't it? If it's not true, then I will also need to cast the rest of features to double.

Answer

You could try casting all columns to double before mapping. Using foldLeft should do the trick:

df.columns.foldLeft(df) { 
  (newDF, colName) => newDF.withColumn(colName, df("colName").cast("double")) 
}

(Sorry, I don't have time to test it for the moment)