Volodymyr Bakhmatiuk Volodymyr Bakhmatiuk - 9 months ago 44
Java Question

Memory efficient way to store lot duplicates of integer > 127

I want to parse a file and keep it in-memory as

Map<aID, Set<bID>>

unique_a_IDs = 50.000;
unique_b_IDs = 1.000;
avg_set_length = 50;

As you can see, all set in summary will keep
unique_a_IDs * avg_set_length = 2.500.000
. Where each
is from 0 to 1000. So in average each
will be stored 2500 times. And I don't want JVM allocate memory 2500 times for each integer.

Is there any trick to keep that data structure memory-efficient?

The problem is that I can't (at least I don't know how yet) to use java's integer/string pools. Integer pool works only for numbers in range -128...127. String pool works only for compile time constants, but I read my
s from file.

Code example

import java.util.*;

public class MemoryTest {

private final static Integer A_IDS_AMOUNT = 65536;
private final static Integer B_IDS_AMOUNT = 1000;
private final static Integer AVERAGE_SET_LENGTH = 50;
private final static Random rand = new Random();

public static void main(String [] args) {
Map<Integer, Set<Integer>> map = new HashMap<>(A_IDS_AMOUNT);
for (int i = 0; i < A_IDS_AMOUNT; i++) {
Set<Integer> set = genRandomSet();
map.put(i, set);
// Where SizeOf is premain class which use java instruments
long size = new SizeOf().deepsize(map) / (1024 * 1024);
System.out.println("Bytes used by object: " + size + " Mb"); //results in 175 Mb

private static Set<Integer> genRandomSet() {
Set<Integer> set = new HashSet<>(AVERAGE_SET_LENGTH);
for (int i = 0; i < AVERAGE_SET_LENGTH; i++) {
return set;

Answer Source

There's java.lang.Integer.IntegerCache.high system property in Java 7 and higher that you can set (e.g. -Djava.lang.Integer.IntegerCache.high=<size>) to cache Integers up to a higher-than-default value - see source code for java.lang.Integer.IntegerCache.

However I doubt that will help you much since you'll still have much more memory consumed by the Map and Sets.