ask ask - 28 days ago 12
Scala Question

Spark Scala: Issue Substituting Filter Expression In DataFrame

I have a dataframe created which holds the join of 2 tables.

I want to compare each field of table1 to that of table2 (Schema is same)
Columns in Table A = colA1, colB1, colC1 , ...
Columns in Table B = colA2, colB2, colC2, ...
So, I need to filter out the data which satisfies the condition

(colA1 = colA2) AND (colB1 = colB2) AND (colC1 = colC2) and so on.
Since my table has a lot of fields, I tried to build a similar exp.

val filterCols = Seq("colA","colB","colC")
val sq = '"'
val exp = filterCols.map({ x => s"(join_df1($sq${x}1$sq) === join_df1($sq${x}2$sq))" }).mkString(" && ")
Resultant Exp : res28: String = (join_df1("colA1") === join_df1("colA2")) && (join_df1("colB1") === join_df1("colB2")) && (join_df1("colC1") === join_df1("colC2"))


Now when i try to substitute it to the dataframe, it throws me an error.
join_df1.filter($exp)

I am not sure whether I am doing it right .I need to find a way to substitute my expression and filter out value.
Any help is appreciated.

enter image description hereThanks in advance

Answer

This is not valid SQL. Try:

val df = Seq(
  ("a", "a", "b", "b", "c", "c"),
  ("a", "A", "b", "B", "c", "C")).toDF("a1", "a2", "b1", "b2", "c1", "c2")

val filterCols = Seq("A", "B", "C")

val exp = filterCols.map(x => s"${x}1 = ${x}2").mkString(" AND ")
df.where(exp)
Comments