flxh flxh - 18 days ago 4x
Java Question

Taking 'snapshots' of big Objects

I'm facing the following issue. One Thread is building and updating a Tree Object. To validate the Tree it is required to calculate the hash of that tree. So a second Thread calculates the hash of that tree continuously.

Now I got the following problem: That tree is about 300mb in size and I wanna ensure that the Tree doesn't change while the hash is calculated, like taking a snapshot and calculate the hash of it.

My guess is I have the following two options:

  1. Blocking writing to the Tree while the hash is calculated.
    (not ideal, because the calculations takes quite some time)

  2. Taking a 'snapshot' by copying that object. And then calculate the hash.
    (also not very good because another 300mb of memory would be required)

Is there a common trick or a pattern for taking 'snapshots' of big objects without just copying them?

(My guess is that this takes profounded changes to the tree object, but I'm grateful for every tip.)

Thanks in advance,

PS: I don't know if it matters for that question but I am using Java (1.8)


I think you should neither block while calculating a new hash nor copying or taking snapshots of the whole tree, especially if the tree takes about 300 mb of memory.

Instead, I'd take another approach. I'd use an incremental hash function. I'm not an expert in these matters, but the best one I know so far is Murmur3F from greenrobot common utility library. Please check their samples.

Murmur3F lets you call its update() method many times. Then you call getValue() to get the actual hash value. And you can do this many times. So I wouldn't recalculate the hash for the whole tree in a separate thread each time it's modified. Using that Murmur3F hash implementation, I'd use the update() method on every update to the tree and getValue() on tree's getHash(), for instance.