M156 M156 - 8 days ago 9
Java Question

How to set up executor IPs inside Docker containers?

The last 3 days I tried to setup a Docker machine with 3 components:
A Spark Master, a Spark Worker and a Driver (Java) Application

When starting the driver OUTSIDE from docker, everything works fine. However Starting all three components leads to an port-firewall-host-nightmare

To keep it (at first) simple I use docker-compose - this is my docker-compose.yml:

driver:
hostname: driver
image: driverimage
command: -Dexec.args="0 192.168.99.100" -Dspark.driver.port=7001 -Dspark.driver.host=driver -Dspark.executor.port=7006 -Dspark.broadcast.port=15001 -Dspark.fileserver.port=15002 -Dspark.blockManager.port=15003 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory
ports:
- 10200:10200 # Module REST Port
- 4040:4040 # Web UI (Spark)
- 7001:7001 # Driver Port (Spark)
- 15001:15001 # Broadcast (Spark)
- 15002:15002 # File Server (Spark)
- 15003:15003 # Blockmanager (Spark)
- 7337:7337 # Shuffle? (Spark)
extra_hosts:
- sparkmaster:192.168.99.100
- sparkworker:192.168.99.100
environment:
SPARK_LOCAL_IP: 192.168.99.100
#SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=15001 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

sparkmaster:
extra_hosts:
- driver:192.168.99.100
image: gettyimages/spark
command: /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h sparkmaster
hostname: sparkmaster
environment:
SPARK_CONF_DIR: /conf
MASTER: spark://sparkmaster:7077
SPARK_LOCAL_IP: 192.168.99.100
SPARK_JAVA_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_WORKER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7006
- 7077
- 6066
ports:
- 6066:6066
- 7077:7077 # Master (Main Port)
- 8080:8080 # Web UI
#- 7006:7006 # Executor

sparkworker:
extra_hosts:
- driver:192.168.99.100
image: gettyimages/spark
command: /usr/spark/bin/spark-class org.apache.spark.deploy.worker.Worker -h sparkworker spark://sparkmaster:7077
# volumes:
# - ./spark/logs:/log/spark
hostname: sparkworker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 4
SPARK_WORKER_MEMORY: 4g
SPARK_WORKER_PORT: 8881
SPARK_WORKER_WEBUI_PORT: 8081
SPARK_LOCAL_IP: 192.168.99.100
#SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_JAVA_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_MASTER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=15003 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
#SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
links:
- sparkmaster
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7006
- 7012
- 7013
- 7014
- 7015
- 7016
- 8881
ports:
- 8081:8081 # WebUI
#- 15003:15003 # Blockmanager+
- 7005:7005 # Executor
- 7006:7006 # Executor
#- 7006:7006 # Executor


I even don't really know anymore which port is actually used etc.. What I know is that my current problem is the following. Driver can communicate to Master, Master can communicate with Worker, i think the Driver can communicate with the worker HOWEVER!!! the driver can't communicate with the / an executor. I also identified the problem. When I open the application UI and open the exectuors tab, then it shows "Executor 0 - Address 172.17.0.1:7005".

So the problem is, that the driver addresses the executor with the Docker gateway address, which does not work. I tried several things (SPARK_LOCAL_IP, using explicit hostnames etc.), but the driver always tries to communicate with the Docker Gateway... Any ideas how to achieve that the driver can communicate with the executor / worker?

Answer

This is due to insufficient configuration options provided by Spark. Spark binds to listen on SPARK_LOCAL_HOSTNAME and it propagates this exact hostname to the cluster. Unfortunately, this setup does not work if driver is behind NAT, for example a Docker container.

You can work around this with the following setup (I've used this hack successfully):

  • forward all necessary ports 1-to-1 (as you do)
  • use a custom hostname for the driver: set e.g. SPARK_LOCAL_HOSTNAME: mydriver
  • for master and worker nodes, add 192.168.99.100 mydriver to /etc/hosts, so that they can reach the Spark driver.
  • for the docker container, map mydriver to 0.0.0.0. This will make the Spark driver bind to 0.0.0.0, so it will be reachable by master and workers:

To do that in docker-compose.yml, simply add the following lines:

 extra_hosts:
  - "mydriver:0.0.0.0"
Comments