lserlohn lserlohn - 24 days ago 9
Scala Question

How can I convert one column data to a vector using Spark Scala

I am using Spark, Scala to process data. I have one question couldn't figure out. I have a dataframe, which is one column:

data
1
2
3
4
5


I want it to a single vector
[1.0,2.0,3.0,4.0,5.0]

How can I implemented it ? I tried
df.collect().toVector
or
rdd.foreach
, but everytime it returns to me an array of vectors [1,0], [2.0], [3.0], [4.0], [5.0], not one single vector.

Answer

This is happening because when you collect a dataframe you get an Array of rows. You need to extract the values from the row objects.

df.collect().map(x => x.getDouble(0)).toVector