SH Y. SH Y. - 11 months ago 124
Scala Question

How to Convert a Column of Dataframe to A List in Apache Spark?

I would like convert a string column of a dataframe to a list. What I can found from the Dataframe API is rdd so I tried converting it back to rdd first, and then apply toArray function to the rdd. In this case, the length and sql work just fine. However, the result I got from rdd has a square brackets around every element like this [A00001]. I was wondering if there's an appropriate way to convert a column to a list or a way to remove the square brackets.

Any suggestions would be appreciated. Thank you!

Answer Source

This should return the collection containing single list:"YOUR_COLUMN_NAME") => r(0)).collect()

Without the mapping, you just get a Row object, which contains every column from the database.

Keep in mind that this will probably get you a list of Any type. Ïf you want to specify the result type, you can use .asInstanceOf[YOUR_TYPE] in r => r(0).asInstanceOf[YOUR_TYPE] mapping

P.S. due to automatic conversion you can skip the .rdd part.