I am running an application with the following code. I don't understand why only 1 executor is in use even though I have 3. When I try to increase the range, my job fails cause the task manager loses executor.
In the summary, I see a value for shuffle writes but shuffle reads are 0 (maybe cause all the data is on one node and no shuffle read needs to happen to complete the job).
val rdd: RDD[(Int, Int)] = sc.parallelize((1 to 10000000).map(k => (k -> 1)).toSeq)
val rdd2= rdd.sortByKeyWithPartition(partitioner = partitioner)
val sorted = rdd2.map((_._1))
val count_sorted = sorted.collect()
..maybe cause all the data is on one node
That should make you think that your RDD has only one partition, instead of 3, or more, that would eventually utilize all the executors.
So, extending on Hokam's answer, here's what I would do:
as described in Partitions.
Now if that is 1, then repartition your RDD, like this:
rdd = rdd.repartition(3)
which will partition your RDD into 3 partitions.
Try executing your code again now.