keypoint keypoint - 3 months ago 14
Linux Question in Spark sbin/ folder is not stopping all slave nodes

Hi I have a Spark cluster in standalone mode, i.e., I have one Spark-master process and three Spark-slave processes running in my laptop (Spark cluster in the same one machine).

Starting master and slaves is just to run the scripts in Spark_Folder/sbin/, and Spark_Folder/sbin/

However, when I run the Spark_Folder/sbin/, it is only stopping one master and one salves, since I have three slaves running, after running I still have two slaves running.

I dig into the script "" and found below:

if [ "$SPARK_WORKER_INSTANCES" = "" ]; then
"$sbin"/ stop org.apache.spark.deploy.worker.Worker 1
for ((i=0; i<$SPARK_WORKER_INSTANCES; i++)); do
"$sbin"/ stop org.apache.spark.deploy.worker.Worker $(( $i + 1 ))

It seems that this script is stopping based on "SPARK_WORKER_INSTANCES" number. But what if I start a slave using a non-number name?

And any idea to shut down the whole spark cluster by one click? (I know to run "pkill -f spark*" will work though)

Thanks a lot.


I just figure out the solution:

in "/usr/lib/spark/conf/", add an extra parameter "SPARK_WORKER_INSTANCES=3" (or the number of your slave instances), then run "/usr/lib/spark/sbin/" and all instances stopped.

However, "" works only for slaves you started using numbers, eg:

/usr/lib/spark/sbin/ 1 spark://master-address:7077
/usr/lib/spark/sbin/ 2 spark://master-address:7077
/usr/lib/spark/sbin/ 3 spark://master-address:7077

if you start slaves using arbitrary names then "" is not working, eg:

/usr/lib/spark/sbin/ myWorer1 spark://master-address:7077
/usr/lib/spark/sbin/ myWorer2 spark://master-address:7077
/usr/lib/spark/sbin/ myWorer3 spark://master-address:7077