DruckerBg DruckerBg - 1 month ago 22
Scala Question

Concurrent tasks on a Spark executor

What determines how many tasks can run concurrently on a Spark executor? Maybe it is some kind of thread pool and shared memory resources?

What parameters control that behavior?

Does it mean that code used in executors should always be written thread-safe?

Answer

What determines how many tasks can run concurrently on a Spark executor?

Spark maps the number tasks on a particular Executor to the number of cores allocated to it. By default, Spark assigns one core to a task which is controlled by the spark.task.cpus parameter which defaults to 1.

Does it mean that code used in executors should always be written thread-safe?

No. Generally working with RDDs or DataFrame/Set is aimed so you do work locally inside a transform, without sharing global resources. You should think about thread-safety when you have a global resource which would execute in parallel inside a single Executor process, which can happen when multiple tasks are executed on the same Executor.