Newbie Newbie - 2 months ago 6
Scala Question

How to do OUTER JOIN in scala

I havce two data frames : df1 and df2

df1

|--- id---|---value---|
| 1 | 23 |
| 2 | 23 |
| 3 | 23 |
| 2 | 25 |
| 5 | 25 |


df2

|-idValue-|---count---|
| 1 | 33 |
| 2 | 23 |
| 3 | 34 |
| 13 | 34 |
| 23 | 34 |


How do I get this ?

|--- id--------|---value---|---count---|
| 1 | 23 | 33 |
| 2 | 23 | 23 |
| 3 | 23 | 34 |
| 2 | 25 | 23 |
| 5 | 25 | null |


I am doing :

val groupedData = df1.join(df2, $"id" === $"idValue", "outer")


But I don't see the last column in the groupedData. Is this correct way of doing ? Or Am I doing any thing wrong ?

Answer

From your expected output, you need LEFT OUTER JOIN.

val groupedData =  df1.join(df2, $"id" === $"idValue", "left_outer").
       select(df1("id"), df1("count"), df2("count")).
       take(10).foreach(println)