Fab Fab - 28 days ago 11
Java Question

Managing JAVA Heap size during storing data from Oracle Database

I need to store in memory a huge amount of data by looping over a result set (Oracle database).

I carried out some test and, by using a profiler, I noticed that there is a considerable difference between the

heap size
and the
used heap
(i.e. my data). Here an example.

enter image description here

I already saw the available JVM arguments in order to set the right heap size, but the problem is that I don't know in advance how many bytes data will occupy (since the amount of data can vary from one test to another).

By observing the graph in the image, the problem seems to be the memory "peaks" during the execution.

Could these peaks be related to the number of fetched rows (or in general to the extracted data?

Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?

Thanks

Answer

By looking at your memory chart, it seems much of the data is of a temporary nature and can be removed from heap at some point. The final ratio of used heap vs. its total size says it all.

It seems like the temporary data's (e.g. buffered data from an Oracle ResultSet) time to live is too high or the eden space it too small and thus data is being moved from the eden and/or survivor space to the old generation space where it's being collected as a result of the JVM detecting the need to run the GC on the old generation space. This possibly happens when you iterate over your ResultSet and the Oracle driver needs to fetch the next chunk of data from the database, which can be fairly large.

At this point I should go a little bit into detail about the Oracle ResultSet buffer. It's basically just a chunk of bytes on the heap. Depending on the column data it is stored as something different than you'd read from the ResultSet. Take a java.sql.Timestamp for instance. Inside the buffer it's stored as an oracle.sql.TIMESTAMP or even just plain bytes. This means that whenever you extract a java.sql.Timestamp from a ResultSet there's the need for another object to be allocated. And this object is most likely the "final" object you want to keep eventually.

I suggest tuning the JVM's GC to your needs. Maybe you can figure out which data is being constantly collected. Try adjusting the eden size so the JVM doesn't need to promote too much to the old generation. You can also adjust how much new space the JVM allocates on demand and how it shrinks when detecting a gap in usage and allocated size.

You can find a list of JVM options here.