pythonic pythonic - 9 months ago 68
Scala Question

"spark.memory.fraction" seems to have no effect

In Spark, I'm getting

java.lang.OutOfMemoryError: Java heap space
error when reading a String of around 1 GB from the HDFS from within a function. The executor memory I use is 6 GB though. To increase the user memory, I even decreased
to just 0.3, but I am still getting the same error. It seems as though decreasing that value had no effect. I am using Spark 1.6.1 and compiling with Spark 1.6 core library. Am I doing something wrong here?

Answer Source

Please see SparkConf

Spark Executor OOM: How to set Memory Parameters on Spark Once a app is running the next most likely error you will see is an OOM on a spark executor. Spark is an extremely powerful tool for doing in-memory computation but it’s power comes with some sharp edges. The most common cause for an executor OOM’ing is that the application is trying to cache or load too much information into memory. Depending on your use case there are several solutions to this:

Increase the storage fraction variable, This can be set as above on either the command line or in the SparkConf object. This variable sets exactly how much of the JVM will be dedicated to the caching and storage of RDD’s. You can set it as a value between 0 and 1, describing what portion of executor JVM memory will be dedicated for caching RDDs. If you have a job that will require very little shuffle memory but will utilize a lot of cached RDD’s increase this variable (example: Caching an RDD then performing aggregates on it.)

If all else fails you may just need additional ram on each worker.

Then increase the amount of ram the application requests by setting spark.executor.memory variable either on the command line or in the SparkConf object.

In your case somehow seems like memory fraction setting was not applied. as advised in comment you can print all settings applied like this to cross check."\n") 

if its not applied, you can set this grammatically and try to see the effect.

val conf = new SparkConf()
  .set("spark.memory.fraction", "1")
  .set("spark.testing.memory", maxOnHeapExecutionMemory.toString)

as described in the test


Please go through this nice post to understand more in detail

Gist of above the post is :

You can see 3 main memory regions on the diagram:

1) Reserved Memory : Memory reserved by the system, and its size is hard coded

2) User Memory (in Spark 1.6 “Java Heap” – “Reserved Memory”) * (1.0 – spark.memory.fraction)

This is the memory pool that remains after the allocation of Spark Memory, and it is completely up to you to use it in a way you like.
User Memory and its completely up to you what would be stored in this RAM and how, Spark makes completely no accounting on what you do there and whether you respect this boundary or not. Not respecting this boundary in your code might cause OOM error.

3) Spark Memory (“Java Heap” – “Reserved Memory”) * spark.memory.fraction, --> Memory pool managed by Spark. Further divided in to

|--> Storage Memory

|--> Execution Memory