Markus Markus - 3 months ago 36
Scala Question

How to properly join two DataFrames for my case?

I use Spark 2.2.0 and Scala 2.11.8. I have some problems with joining two DataFrames.

df1 =

product1_PK product2_PK
111 222
333 111


df2 =

product_PK product_name
111 AAA
222 BBB
333 CCC

I want to get this result:

product1_PK product2_PK product1_name product2_name
111 222 AAA BBB
333 111 CCC AAA

How can I do it?

This is how I tried as a part solution, but I don't know how to efficiently make joining for both
and rename columns:

val result ="left")
.join("right"), $"left.product1_PK" === $"right.product_PK")

Answer Source

You need to use two joins : first for product1_name and second for product2_name

df1.join(df2.withColumnRenamed("product_PK", "product1_PK").withColumnRenamed("product_name", "product1_name"), Seq("product1_PK"), "left")
  .join(df2.withColumnRenamed("product_PK", "product2_PK").withColumnRenamed("product_name", "product2_name"), Seq("product2_PK"), "left")

You should have your desired output as

|222        |111        |AAA          |BBB          |
|111        |333        |CCC          |AAA          |
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download