Shankar Shankar - 1 month ago 9
Scala Question

Difference between sc.broadcast and broadcast function in spark sql

I have used

sc.broadcast
for lookup files to improve the performance.

I also came to know there is a function called
broadcast
in Spark SQL Functions.

What is the difference between two?

Which one i should use it for broadcasting the reference/look up tables?

Answer

If you want to achieve broadcast join in Spark SQL you should use broadcast function (combined with desired spark.sql.autoBroadcastJoinThreshold configuration). It will:

  • Mark given relation for broadcasting.
  • Adjust SQL execution plan.
  • When output relation is evaluated it will take care of collecting data, and broadcasting, and applying correct join mechanism.

SparkContext.broadcast is used to handle local objects and is applicable for use with Spark DataFrames.

Comments