Wang Liqin Wang Liqin - 27 days ago 13
Java Question

Overhead if GC for memory in the JVM vs Swift style ARC

The company that I work in have a kind of very different viewpoints regarding the JVM development platform.

Based on this paper here - http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

They are saying that an oracle jvm requires 3-5x the memory overheead i.e to operate a 1GB JVM we require 3-5 GB of RAM extra to counteract the JVM overhead and swift style ARC is the advanced alien tech that apple has developed which is the silver bullet to answer GC issues

I have made some counter arguments that it was not a Oracle/Sun JVM that they conducted a study on and some experimental VM and ARC has its own problems like circular references.

Is there any research conducted on what exactly/approximately is the overhead of GC for memory in the JVM, I could not find any.

My questions summarized

1) Is there any visible overhead for GC. Cause the 3-5x cost of RAM seems to really unreasonable if the fact is true.

Also, big data applications such as Apache spark,hbase,cassandra operate in terabyte/petabyte memory scale. If there is such an overhead in GC why would they develop in such a platform?

2) ARC is considered to be inferior to other runtime GC tracing algorithms. If this is true, it would also be helpful if there are any papers directly comparing the effects of ARC compile time malloc/free vs JVM GC runtime cleanup

There is a claim by Chris Lattner which says GC consumes 3-5x memory here - https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160208/009422.html

Answer

Is there any visible overhead for GC. Cause the 3-5x cost of RAM seems to really unreasonable if the fact is true.

This is most likely a misunderstanding. You can run a JVM where 99% of the heap is used, however it will GC regularly. If you give the application more memory, it will be able to work more efficiently. Adding more memory to the heap can improve throughput. I have seen this work up to about 3x. Except in extreme cases, you are unlikely to see any benefit in adding more.

Also, big data applications such as Apache spark,hbase,cassandra operate in terabyte/petabyte memory scale. If there is such an overhead in GC why would they develop in such a platform?

When working with big data, you often use memory mapped files and off heap memory. This places the bulk of the data to be managed by the application not the GC. This is no different to how a data base written in C++ might operate.

ARC is considered to be inferior to other runtime GC tracing algorithms.

I couldn't comment on how smart ARC is. Java doesn't place any restrictions on how the GC should operate, but the sub-text is; it has to at least handle circular references. Anything less is assumed to be unacceptable.

BTW Java uses malloc/free via direct ByteBuffers.

jobs with datasets such as 1 GB

What makes a data set 1 GB. Compressed on disk it might be 100 MB. As raw uncompressed data it might be 1 GB. In memory as a data structure it might be 2 GB, and the throughput might be faster if you use say another 1 or 2 GB to work on that data structure.

Comments