aola aola - 1 month ago 9
Scala Question

joining DataFrames in spark

I would like joing two dataframes: edges and selectedComponent by two keys using

or
function

val selectedComponent = hiveContext.sql(s"""select * from $tableWithComponents
|where component=$component""".stripMargin)


but not this way

val theSelectedComponentEdges = hiveContext.sql(
s"""select * from $tableWithComponents a join $edges b where (b.src=a.id or b.dst=a.id)""")


but using join function

edges.join(selectedComponent, edges("src")===selectedComponent("id"))


but I am not sure how I supposed to using here "or".

Anyone can help me :-)?

Answer
edges.join(selectedComponent, (edges("src")===selectedComponent("id")) ||  (edges("dst")===selectedComponent("id")))