keypoint keypoint - 3 months ago 14
Linux Question

stop-all.sh in Spark sbin/ folder is not stopping all slave nodes

Hi I have a Spark cluster in standalone mode, i.e., I have one Spark-master process and three Spark-slave processes running in my laptop (Spark cluster in the same one machine).

Starting master and slaves is just to run the scripts in Spark_Folder/sbin/start-master.sh, and Spark_Folder/sbin/stop-master.sh.

However, when I run the Spark_Folder/sbin/stop-all.sh, it is only stopping one master and one salves, since I have three slaves running, after running stop-all.sh I still have two slaves running.

I dig into the script "stop-slaves.sh" and found below:

if [ "$SPARK_WORKER_INSTANCES" = "" ]; then
"$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker 1
else
for ((i=0; i<$SPARK_WORKER_INSTANCES; i++)); do
"$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker $(( $i + 1 ))
done
fi


It seems that this script is stopping based on "SPARK_WORKER_INSTANCES" number. But what if I start a slave using a non-number name?

And any idea to shut down the whole spark cluster by one click? (I know to run "pkill -f spark*" will work though)

Thanks a lot.

Answer

I just figure out the solution:

in "/usr/lib/spark/conf/spark-env.sh", add an extra parameter "SPARK_WORKER_INSTANCES=3" (or the number of your slave instances), then run "/usr/lib/spark/sbin/stop-all.sh" and all instances stopped.

However, "stop-all.sh" works only for slaves you started using numbers, eg:

/usr/lib/spark/sbin/start-slave.sh 1 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh 2 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh 3 spark://master-address:7077

if you start slaves using arbitrary names then "stop-all.sh" is not working, eg:

/usr/lib/spark/sbin/start-slave.sh myWorer1 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh myWorer2 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh myWorer3 spark://master-address:7077