Ashwin Swarup - 10 months ago 63

Scala Question

I have a org.apache.spark.mllib.linalg.Vector RDD that [Int Int Int] .

I am trying to convert this into a dataframe using this code

`import sqlContext.implicits._`

import org.apache.spark.sql.types.StructType

import org.apache.spark.sql.types.StructField

import org.apache.spark.sql.types.DataTypes

import org.apache.spark.sql.types.ArrayData

vectrdd belongs to the type org.apache.spark.mllib.linalg.Vector

`val vectarr = vectrdd.toArray()`

case class RFM(Recency: Integer, Frequency: Integer, Monetary: Integer)

val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

I am getting the following error

`warning: fruitless type test: a value of type`

org.apache.spark.mllib.linalg.Vector cannot also be a Array[T]

val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

error: pattern type is incompatible with expected type;

found : Array[T]

required: org.apache.spark.mllib.linalg.Vector

val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

The second method i tried is this

`val vectarr=vectrdd.toArray().take(2)`

case class RFM(Recency: String, Frequency: String, Monetary: String)

val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()

I got this error

`error: constructor cannot be instantiated to expected type;`

found : (T1, T2, T3)

required: org.apache.spark.mllib.linalg.Vector

val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()

I used this example as a guide >>

Convert RDD to Dataframe in Spark/Scala

Answer Source

`vectarr`

will have type of `Array[org.apache.spark.mllib.linalg.Vector]`

, so in the pattern matching you cannot match `Array(p0, p1, p2)`

because what is being matched is a Vector, not Array.

Also, you should not do `val vectarr = vectrdd.toArray()`

- this will convert the RDD to Array and then the final call to `toDF`

will not work, since `toDF`

only works on RDD's.

The correct line would be (provided you change `RFM`

to have Doubles)

```
val df = vectrdd.map(_.toArray).map { case Array(p0, p1, p2) => RFM(p0, p1, p2)}.toDF()
```

or, equivalently, replace `val vectarr = vectrdd.toArray()`

(which produces `Array[Vector]`

) with `val arrayRDD = vectrdd.map(_.toArray())`

(producing `RDD[Array[Double]]`

)