Abhinandan Abhinandan - 1 year ago 121
Java Question

Why does Spark Standalone cluster not use all available cores?

I have done following configuration for Apache Spark 1.2.1 Standalone Cluster:

  • Hadoop 2.6.0

  • 2 nodes - one master and one slave - in Standalone cluster

  • 3-node Cassandra

  • total cores: 6 (2 master, 4 slaves)

  • total memory: 13 GB

I run Spark in Standalone cluster manager as:

./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
--conf num-executors=2 \
--executor-memory 2g \
--driver-memory 3g \
--deploy-mode cluster \
--supervise \
--master spark://abc.xyz.net:7077 \
hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json

My job is getting executed successfully, i.e. reads data from files and inserts it to Cassandra.

Spark documentation says that in Standalone cluster make use of all available cores but my cluster is using only 1 core per application. Also,after starting application on Spark UI it is showing Applications:0 running and Drivers:1 running.

My query is:

  1. Why it is not using all available 6 cores?

  2. Why spark UI showing Applications:0 Running?

The code:

public static void main(String[] args) throws Exception {

String fileName = args[0];
System.out.println("----->Filename : "+fileName);

Long now = new Date().getTime();

SparkConf conf = new SparkConf(true)
.setAppName("JavaSparkSQL_" +now)
.set("spark.executor.memory", "1g")
.set("spark.cassandra.connection.host", "")
.set("spark.cassandra.connection.native.port", "9042")
.set("spark.cassandra.connection.rpc.port", "9160");

JavaSparkContext ctx = new JavaSparkContext(conf);

JavaRDD<String> input = ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());

System.out.print("Count --> "+result.count());
System.out.println(StringUtils.join(result.collect(), ","));



Answer Source

If you're setting your master in your app to local (via .setMaster("local")), it will not connect to the spark://abc.xyz.net:7077.

You don't need to set the master in app if you are setting it up with the spark-submit command.