pythonic pythonic - 24 days ago 11
Scala Question

Which type of Spark memory should I increase on Java out of memory error?

So, I have a pattern like shown below.

def someFunction(...) : ... =
{
// Somewhere here some large string (still < 1 GB) is made ...
// ... and sometimes I get Java.lang.OutOfMemoryError while building that string
}

....
val RDDb = RDDa.map(x => someFunction(...))


So, inside
someFunction
, at one place a large string is made, which is still not that big (< 1 GB), but I get
java.lang.OutOfMemoryError: Java heap space
error sometimes while building that string. This happens even when my executor memory is quite large (8 GB).

According to this article, there is User memory and Spark memory. Now in my case, which one's fraction should I increase, the User memory's or the Spark memory's?

P.S: I am using Spark version 2.0

Answer

1G raw string can use more than 8G memory easily. It's better to use streaming processing, like XMLEventReader for XML.

Ref to estimation in book Algorithm by Rober Sedgewick and Kevin Wayne. Each string has 56 bytes overhead. Memory estimation

I wrote a simple test program and run with -Xmx8G

object TestStringBuilder {
  val m = 1024 * 1024
  def memUsage(): Unit = {
    val runtime = Runtime.getRuntime

    println(
      s"""max: ${runtime.maxMemory() / m} M 
         |allocated: ${runtime.totalMemory() / m} M 
         |free: ${runtime.freeMemory() / m} M""".stripMargin)
  }

  def main(args: Array[String]): Unit = {
    val builder = new StringBuilder()
    val size = 10 * m
    try {
      while (true) {
        builder.append(Math.random())
        if (builder.length % size == 0) {
          println(s"len is ${builder.length / m} M")
          memUsage()
        }
      }
    }
    catch {
      case ex: OutOfMemoryError =>
        println(s"OutOfMemoryError len is ${builder.length/m} M")
        memUsage()
      case ex =>
        println(ex)
    }
  }
}

Output might be something like this.

len is 140 M
max: 7282 M allocated: 673 M free: 77 M
len is 370 M
max: 7282 M allocated: 2402 M free: 72 M
len is 470 M
max: 7282 M allocated: 1479 M free: 321 M
len is 720 M
max: 7282 M allocated: 3784 M free: 314 M
len is 750 M
max: 7282 M allocated: 3784 M free: 314 M
len is 1020 M
max: 7282 M allocated: 3784 M free: 307 M
OutOfMemoryError len is 1151 M
max: 7282 M allocated: 3784 M free: 303 M
Comments