user4046073 user4046073 - 1 year ago 67
Scala Question

Why is join not possible after show operator?

The following code works fine until I add

show
after
agg
. Why is
show
not possible?

val tempTableB = tableB.groupBy("idB")
.agg(first("numB").as("numB")) //when I add a .show here, it doesn't work

tableA.join(tempTableB, $"idA" === $"idB", "inner")
.drop("idA", "numA").show


The error says:

error: overloaded method value join with alternatives:
(right: org.apache.spark.sql.Dataset[_],joinExprs: org.apache.spark.sql.Column,joinType: String)org.apache.spark.sql.DataFrame <and>
(right: org.apache.spark.sql.Dataset[_],usingColumns: Seq[String],joinType: String)org.apache.spark.sql.DataFrame
cannot be applied to (Unit, org.apache.spark.sql.Column, String)
tableA.join(tempTableB, $"idA" === $"idB", "inner")
^


Why is this behaving this way?

Answer Source

.show() is a function with, what we call in Scala, a side-effect. It prints to stdout and returns Unit(), just like println

Example:

val a  = Array(1,2,3).foreach(println)
a: Unit = ()

In scala, you can assume that everything is a function and will return something. In your case, Unit() is being returned and that's what's getting stored in tempTableB.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download