Avi Avi - 4 months ago 32
Scala Question

Joining two dataframes in spark sql and getting data of only one

I have two dataframes in Spark Sql(D1 and D2).

I am trying to inner join both of them [D1.join(D2, "some column")]
and get back data of only D1, not the complete data set.

Both D1 and D2 are having the same columns.

Could some one please help me on this??

I am using Spark 1.6.

Answer

Let say you want to join on "id" column. Then you could write :

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._    
d1.as("d1").join(d2.as("d2"), $"d1.id" === $"d2.id").select($"d1.*")
Comments