John Todd John Todd - 3 months ago 50
Scala Question

Pivot non-numeric table in Spark Scala

Is it possible to pivot a table with non-numeric values in Spark Scala? I have reviewed the following two Stack questions.

Pivot Spark Dataframe

List in the Case-When Statement in Spark SQL

Following the steps in the "List in the Case-When" question, I can transform my data so that each data type is a column, but there is a row for each entity-data type combination.

id tag value
1 US foo
1 UK bar
1 CA baz
2 US hoo
2 UK hah
2 CA wah

id US UK CA
1 foo
1 bar
1 baz
2 hoo
2 hah
3 wah


Is there a "first non-null" function that can collapse the multiple rows for each entity into just one?

id US UK CA
1 foo bar baz
2 hoo hah wah

Answer

You may consider the aggregate method (or aggregateByKey). You just need to write the proper funtions to get the non-null element at each position.

Comments