gsamaras - 10 days ago 5
C++ Question

# Query points on the vertices of a Hamming cube

I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.

A vertex may not contain any point. So every point has coordinates in {0, 1}D. I am only interested in query time, as long as the memory cost is reasonable ( not exponential in N for example :) ).

Given a query that lies on one of the cube's vertices and an input parameter

`r`
, find all the vertices (thus points) that have hamming distance <=
`r`
with the query.

What's the way to go in a environment?

I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative, would be appreciated! Since hamming distance comes into play, bitwise manipulations should help (e.g. XOR).

There is a nice bithack to go from one bitmask with `k` bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with `k` bits set. XORing these masks with an initial value gives all the values at hamming distance exactly `k` away from it.

So for `D` dimensions, where `D` is less than 32 (otherwise change the types),

``````uint32_t limit = (1u << D) - 1;
for (int k = 1; k <= r; k++) {
uint32_t diff = (1u << k) - 1;
while (diff <= limit) {
// v is the input vertex
uint32_t vertex = v ^ diff;
// use it
diff = nextBitPermutation(diff);
}
}
``````

Where `nextBitPermutation` may be implemented in C++ as something like (if you have `__builtin_ctz`)

``````uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
``````

Or for MSVC (not tested)

``````uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
unsigned long tzc;
_BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
}
``````

If D is really low, 4 or lower, the old `popcnt`-with-`pshufb` works really well and generally everything just lines up well, like this:

``````uint16_t query(int vertex, int r, int8_t* validmask)
{
// validmask should be array of 8 int8_t's,
// 0 for a vertex that doesn't exist, -1 if it does
__m128i t0 = _mm_set1_epi8(vertex);
__m128i r0 = _mm_set1_epi8(r + 1);
__m128i all =        _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
__m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2,  2,  3,  2,  3,  3,  4);
__m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
__m128i close_enough = _mm_cmpgt_epi8(r0, dist);
__m128i result = _mm_and_si128(close_enough, valid);
This should be fairly fast; fast compared to the bithack above (`nextBitPermutation`, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin `popcnt`, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.