ask - 1 year ago 63

Scala Question

I have a dataframe created which holds the join of 2 tables.

I want to compare each field of table1 to that of table2 (Schema is same)

Columns in Table A = colA1, colB1, colC1 , ...

Columns in Table B = colA2, colB2, colC2, ...

So, I need to filter out the data which satisfies the condition

(colA1 = colA2) AND (colB1 = colB2) AND (colC1 = colC2) and so on.

Since my table has a lot of fields, I tried to build a similar exp.

`val filterCols = Seq("colA","colB","colC")`

val sq = '"'

val exp = filterCols.map({ x => s"(join_df1($sq${x}1$sq) === join_df1($sq${x}2$sq))" }).mkString(" && ")

Resultant Exp : res28: String = (join_df1("colA1") === join_df1("colA2")) && (join_df1("colB1") === join_df1("colB2")) && (join_df1("colC1") === join_df1("colC2"))

Now when i try to substitute it to the dataframe, it throws me an error.

join_df1.filter($exp)

I am not sure whether I am doing it right .I need to find a way to substitute my expression and filter out value.

Any help is appreciated.

Thanks in advance

Answer Source

This is not valid SQL. Try:

```
val df = Seq(
("a", "a", "b", "b", "c", "c"),
("a", "A", "b", "B", "c", "C")).toDF("a1", "a2", "b1", "b2", "c1", "c2")
val filterCols = Seq("A", "B", "C")
val exp = filterCols.map(x => s"${x}1 = ${x}2").mkString(" AND ")
df.where(exp)
```