UebTV7NzU9QRBF7nUbcYM o -4 years ago 36

C Question

As title says, I am searching for the optimal way of storing sets in memory. I am only interested in sets of bytes (array of integers from

`0`

`255`

The first method I came up with is to allocate array of

`256`

`32`

`n`

`n`

The second approach I tried is to store sets as regular arrays. So, if a set contains

`n`

`n + 1`

My third attempt is to enumerate all possible sets and the just store the index of set (integer representing its index in list of all possible sets of bytes). But, it turned out that it is absolutelly equivalent as the first approach. Basically, I will still need

`32`

Fourth attempt I made is based on my second approach. I noticed that is the set contains

`n`

`n + 1`

`k`

`k`

`k - 1`

`[3, 3, 5, 7]`

`3`

`{3, 4, 5}`

`[3, 3, 5, 6]`

`{3, 4, 2}`

`3`

`4`

`6`

`2`

`4`

`4`

`3`

`2`

`1`

`n / 16`

`n / 2 * 1 / 8`

Fifth approach I made is similar to my second approach, but it differently interpret number fo elements. If number of elements are less than

`128`

`128`

My last attempt (sixth attempt) is to enumerate just some sets (for example create a list of sets which will contain: full set, set with only even numbers, set with only odd numbers, set with elements less than 128, set with elements greater than 128, etc) and then to use elements from that list and basic set operations (union, intersection, etc) to reconstruct original set. It will require few bytes for each base set we use from the list and it will require a few bits for union or intersection operations, and of course one byte for length of our sequence. It very depends on number of elements in the base set list which should be hardcoded, but it seems hard to preoperly create and properly chose elements which are in that list. Anyway, something tells me that this is not very clever approach.

But hat is actually the most optimal way? Something tells me that my fourth attempt is not so bad, but can we do better? The sets I opereate with have random number of elements, so on average

`128`

`128`

`16`

Just to mention again, speed is not important. Encoding/decoding may be extremelly slow, the only important thing is that sets require as little amount of memory as possible. When I said

P.S. If you want some code (but I really don't see why would you) I can post here my solutions I made in C for all of these approaches. Anyway, I am not asking for code or technical details how to implement this in specific programming language, I am just asking for method/algorithm for compressing sets.

Thank you in advance.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Your first method (and the third method, which is equivalent) is already optimal. It cannot be improved.

There are 2^{256} possible sets of numbers you're working with. By the pigeonhole principle, you need 2^{256} numbers to identify them all, and you'll need 256 bits to represent those numbers. Any method of identifying the sets which used fewer than 256 bits would leave at least one pair (and probably many pairs) of sets sharing the same identifier.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**

Latest added