FUZxxl - 2 months ago 9
C Question

# How can I effectively encode/decode a compressed position description?

I am writing a tablebase for a Japanese chess variant. To index the table base, I encode each chess position as an integer. In one of the encoding steps, I encode where the pieces are on the board. Since the actual method is a bit complicated, let me explain the problem in a simplified manner.

## The Encoding

In the endgame tablebase, I have (let's say) six distinct chess pieces that I want to distribute over a board with 9 squares. I can naïvely represent their positions by a six-tuple (a, b, c, d, e, f ) where each of the variables a to f is a number in the range 0 to 8 inclusive indicating where the corresponding chess piece is located.

However, this representation is not optimal: no two chess pieces can occupy the same square but the aforementioned encoding happily allows this. We can encode the same position by a six-tuple [a, b', c', d', e', f' ] where a is the same a as before, b' is a number from 0 to 7 inclusive indicating the number of the square the second piece is on. This works by assigning a number from 0 to 7 to each square the first piece is not on. For example, if the first piece is on square 3, the square numbers for the second piece are:

``````1st piece: 0 1 2 3 4 5 6 7 8
2nd piece: 0 1 2 - 3 4 5 6 7
``````

the other pieces are encoded similarly, c' as a number from 0 to 6, d' as a number from 0 to 5, etc. For example the naïve encoding (5, 2, 3, 0, 7, 4) yields the compact encoding (5, 2, 2, 0, 3, 1):

``````1st: 0 1 2 3 4 5 6 7 8 --> 5
2nd: 0 1 2 3 4 - 5 6 7 --> 2
3rd: 0 1 - 2 3 - 4 5 6 --> 2
4th: 0 1 - - 2 - 3 4 5 --> 0
5th: - 0 - - 1 - 2 3 4 --> 3
6th: - 0 - - 1 - 2 - 3 --> 1
``````

In my actual encoding, the number of pieces I want to encode is not fixed. The number of squares on the board however is.

## The Question

How can I efficiently convert the naïve representation to the compact representation and vice versa? I use standard C99 for the program. In the context of this question, I am not interested in answers that use non-standard constructs, inline assembly or intrinsics.

## Question Clarification

As there seems to be some confusion about the question:

• The question is to find a practically efficient way to implement the conversion between the naïve and the compact position representations

• Both representations are n-tuples of integers in certain ranges. The question is not about how to encode these representations into anything else.

• In one of the cases I have, the number of squares is 25 and the number of pieces is up to 12. I am however interested in an implementation that works for a reasonable parameter space (e.g. up to 64 squares and up to 32 pieces).

• I am not interested in alternative representations or encodings, especially representations or encodings that are not optimal.

• Nor am I interested in remarks that the compact representation isn't worth the effort.

• Nor am I interested in answers that use intrinsics, inline assembly or any other non-standard constructs (except perhaps those described by POSIX).

I have found a more elegant solution for up to 16 positions using 64-bit integers with a single loop for both encoding and decoding:

``````#include <stdio.h>
#include <stdlib.h>

void encode16(int dest[], int src[], int n) {
unsigned long long state = 0xfedcba9876543210;
for (int i = 0; i < n; i++) {
int p4 = src[i] * 4;
dest[i] = (state >> p4) & 15;
state -= 0x1111111111111110 << p4;
}
}

void decode16(int dest[], int src[], int n) {
unsigned long long state = 0xfedcba9876543210;
for (int i = 0; i < n; i++) {
int p4 = src[i] * 4;
dest[i] = (state >> p4) & 15;
unsigned long long mask = ((unsigned long long)1 << p4) - 1;
}
}

int main(int argc, char *argv[]) {
int naive[argc], compact[argc];
int n = argc - 1;

for (int i = 0; i < n; i++) {
naive[i] = atoi(argv[i + 1]);
}

encode16(compact, naive, n);
for (int i = 0; i < n; i++) {
printf("%d ", compact[i]);
}
printf("\n");

decode16(naive, compact, n);
for (int i = 0; i < n; i++) {
printf("%d ", naive[i]);
}
printf("\n");
return 0;
}
``````

The code uses 64-bit unsigned integers to hold arrays of 16 values in the range `0..15`. Such an array can be updated in parallel in a single step, extracting a value is straightforward and deleting a value is a bit more cumbersome but still only a few steps.

You could extend this method to 25 positions using non-portable 128-bit integers (type `__int128` is supported by both gcc and clang), encoding each position on 5 bits, taking advantage of the fact that `5 * 25 < 128`, but the magical constants are more cumbersome to write.