MallowFox MallowFox -4 years ago 76
Java Question

Any way to compress java arraylist?

I have a data structure:

ArrayList<String>[] a = new ArrayList[100000];


each list has about 1000 strings with about 100 characters.

I'm doing an one-off job with it, and it cost a little more memory than I can bear.

I think I can change less code if I can find ways to reduce some memory cost , as the cost is not too much , and it's just an one-off job. So, please tell me all possible ways you know.

add some info: the reason I;m using a array of arraylists is that the size 100000 is what I can know now. But I don't know the size of each arraylist before I work through all the data.

And the problem is indeed too much data, so I want to find ways to compress it. It's not a allocation problem. There will finally be too much data to exceed the memory.

Answer Source

it cost a little more memory than I can bear

So, how much is "a little"?

Some quick estimates:

You have collections of string of 1000x100 characters. That should be about 1000x100x2 = 200kb of string data.

If you have 100000 of those, you'll need almost 20Gb for the data alone.

Compared to the 200kb of each collection's data the overhead of your data structures is miniscule, even if it was 100 bytes for each collection (0.05%).

So, not much to be gained here.

Hence, the only viable ways are:

  • Data compression of some kind to reduce the size of the 20Gb payload

  • Use of external storage, e.g. by only reading in strings which are needed at the moment and then discarding them

To me, it is not clear if your memory problem really comes from the data structure you showed (did you profile the program?) or from the total memory usage of the program. As I commented on another answer, resizing an array(list) for instance temporarily requires at least 2x the size of the array(list) for the copying operation. Then notice that you can create memory leaks in Java - or just be holding on to data you actually won't need again.

Edit:

A String in Java consists of an array of chars. Every char occupies two bytes.

You can convert a String to a byte[], where any ASCII character should need one byte only (non-ASCII characters will still need 2 (or more) bytes):

str.getBytes(Charset.forName("UTF-8"))

Then you make a Comparator for byte[] and you're good to go. (Notice though that byte has a range of [-128,127] which makes comparing non-intuitive in this case; you may want to compare (((int)byteValue) & 0xff).)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download