Shankar Shankar - 1 year ago 128
Scala Question

Difference between sc.broadcast and broadcast function in spark sql

I have used

for lookup files to improve the performance.

I also came to know there is a function called
in Spark SQL Functions.

What is the difference between two?

Which one i should use it for broadcasting the reference/look up tables?

Answer Source

If you want to achieve broadcast join in Spark SQL you should use broadcast function (combined with desired spark.sql.autoBroadcastJoinThreshold configuration). It will:

  • Mark given relation for broadcasting.
  • Adjust SQL execution plan.
  • When output relation is evaluated it will take care of collecting data, and broadcasting, and applying correct join mechanism.

SparkContext.broadcast is used to handle local objects and is applicable for use with Spark DataFrames.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download