Chen Fan Chen Fan - 5 months ago 39
Scala Question

Scala-dataframe, Merge different columns into new rows

I am currently using scala and wonder if we can merge different columns into one?
For example if I got:

+------+--------+-------+----------+-----+
| User | family | phone | location | raz |
+------+--------+-------+----------+-----+
| u1 | f1 | p1 | l1 | r1 |
+------+--------+-------+----------+-----+
| u2 | f2 | p2 | l2 | r2 |
+------+--------+-------+----------+-----+
| u3 | f3 | p3 | l3 | r3 |
+------+--------+-------+----------+-----+


How do I combine phone,location and raz into 1 column, each one of their value on a different row?

| User | family | new |
+------+--------+-------+
| u1 | f1 | p1 |
+------+--------+-------+
| u1 | f1 | l1 |
+------+--------+-------+
| u1 | f1 | r1 |
+------+--------+-------+
| u2 | f2 | p2 |
+------+--------+-------+
| u2 | f2 | l2 |
+------+--------+-------+
| u2 | f2 | r2 |
+------+--------+-------+
| u3 | f3 | p3 |
+------+--------+-------+
| u3 | f3 | l3 |
+------+--------+-------+
| u3 | f3 | r3 |
+------+--------+-------+


Thanks

Answer Source

One approach would be to gather those columns to be flattened into an array column and explode it:

val df = Seq(
  ("u1", "f1", "p1", "l1", "r1"),
  ("u2", "f2", "p2", "l2", "r2"),
  ("u3", "f3", "p3", "l3", "r3")
).toDF("User", "family", "phone", "location", "raz")

val df2 = df.
  withColumn("plr", array($"phone", $"location", $"raz")).
  withColumn("new", explode($"plr")).
  select("User", "family", "new")

df2.show
+----+------+---+
|User|family|new|
+----+------+---+
|  u1|    f1| p1|
|  u1|    f1| l1|
|  u1|    f1| r1|
|  u2|    f2| p2|
|  u2|    f2| l2|
|  u2|    f2| r2|
|  u3|    f3| p3|
|  u3|    f3| l3|
|  u3|    f3| r3|
+----+------+---+