LaRage LaRage - 1 month ago 6
C# Question

What are the memory limits of a HashSet?

What are the memory limitations of a

Hashset<string>
in C#?

I've seen that .NET has a memory limit of 2Gb per object? Is this information still accurate ? Does it apply for Hashsets?

I'm currently working on an application that works with a large hashset and I've seen that as soon as I build the dll's for 64 bit environment I get OutOfMemory only when my 8GB RAM laptop reaches its memory limits.

If I would of had 16Gb RAM would the object increase until it reaches the hardware limitations?

Answer

There is a 2GB limit per object, but remember that a reference type only uses the pointer size (8 bytes for x64) when it's a field in a class.

Array memory sizes are computed as follows (ignoring fixed overhead):

For arrays of struct types:

  • Array memory size = #elements in the array * size of each element

For arrays of reference types:

  • Array memory size = #elements in the array * reference size (4 bytes for x8x, 8 bytes for x64)

So a HashSet could reference objects totalling a lot more than the 2GB limit. It's just that if you add up the size taken by each field in the class - 64 bits for reference types, and the full size for struct types - it must be less than 2GB.

You could have a class that contained 16x1GB arrays of bytes, for instance.

Also note that it's possible to configure an application to allow arrays larger than 2GB in size - although you can still only have 2GB of elements in a single dimensional array.

I suspect that the objects that you are storing in the HashSet are reference types, so it's only using 64 bits for each one in the internal HashSet array, while the full size of each of your objects is much larger than 64 bits - which gives a total size in excess of 2GB.

Looking at the referencesource for HashSet shows that the following arrays are used:

private int[] m_buckets;
private Slot[] m_slots;

Where Slot is defined like so:

internal struct Slot {
    internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
    internal T value;
    internal int next;          // Index of next entry, -1 if last
}

It looks like each Slot struct occupies 16 bytes on x64 when T is a reference type, which means that HashSet will throw OutOfMemory when the number of slots in use exceeds 2GB/16 = 128M elements

(If T is a struct then depending on its size you'll run out of memory a lot sooner.)

Comments