Volodymyr Bakhmatiuk Volodymyr Bakhmatiuk - 1 month ago 8
Java Question

Memory efficient way to store lot duplicates of integer > 127

I want to parse a file and keep it in-memory as

Map<aID, Set<bID>>
.

unique_a_IDs = 50.000;
unique_b_IDs = 1.000;
avg_set_length = 50;


As you can see, all set in summary will keep
unique_a_IDs * avg_set_length = 2.500.000
of
bIDs
. Where each
bID
is from 0 to 1000. So in average each
bID
will be stored 2500 times. And I don't want JVM allocate memory 2500 times for each integer.


Is there any trick to keep that data structure memory-efficient?

The problem is that I can't (at least I don't know how yet) to use java's integer/string pools. Integer pool works only for numbers in range -128...127. String pool works only for compile time constants, but I read my
bID
s from file.

Code example



import java.util.*;

public class MemoryTest {

private final static Integer A_IDS_AMOUNT = 65536;
private final static Integer B_IDS_AMOUNT = 1000;
private final static Integer AVERAGE_SET_LENGTH = 50;
private final static Random rand = new Random();

public static void main(String [] args) {
Map<Integer, Set<Integer>> map = new HashMap<>(A_IDS_AMOUNT);
for (int i = 0; i < A_IDS_AMOUNT; i++) {
Set<Integer> set = genRandomSet();
map.put(i, set);
}
// Where SizeOf is premain class which use java instruments
long size = new SizeOf().deepsize(map) / (1024 * 1024);
System.out.println("Bytes used by object: " + size + " Mb"); //results in 175 Mb
}

private static Set<Integer> genRandomSet() {
Set<Integer> set = new HashSet<>(AVERAGE_SET_LENGTH);
for (int i = 0; i < AVERAGE_SET_LENGTH; i++) {
set.add(rand.nextInt(B_IDS_AMOUNT));
}
return set;
}
}

Answer

There's java.lang.Integer.IntegerCache.high system property in Java 7 and higher that you can set (e.g. -Djava.lang.Integer.IntegerCache.high=<size>) to cache Integers up to a higher-than-default value - see source code for java.lang.Integer.IntegerCache.

However I doubt that will help you much since you'll still have much more memory consumed by the Map and Sets.