FUZxxl FUZxxl - 2 months ago 9
C Question

How can I effectively encode/decode a compressed position description?

I am writing a tablebase for a Japanese chess variant. To index the table base, I encode each chess position as an integer. In one of the encoding steps, I encode where the pieces are on the board. Since the actual method is a bit complicated, let me explain the problem in a simplified manner.

The Encoding



In the endgame tablebase, I have (let's say) six distinct chess pieces that I want to distribute over a board with 9 squares. I can naïvely represent their positions by a six-tuple (a, b, c, d, e, f ) where each of the variables a to f is a number in the range 0 to 8 inclusive indicating where the corresponding chess piece is located.

However, this representation is not optimal: no two chess pieces can occupy the same square but the aforementioned encoding happily allows this. We can encode the same position by a six-tuple [a, b', c', d', e', f' ] where a is the same a as before, b' is a number from 0 to 7 inclusive indicating the number of the square the second piece is on. This works by assigning a number from 0 to 7 to each square the first piece is not on. For example, if the first piece is on square 3, the square numbers for the second piece are:

1st piece: 0 1 2 3 4 5 6 7 8
2nd piece: 0 1 2 - 3 4 5 6 7


the other pieces are encoded similarly, c' as a number from 0 to 6, d' as a number from 0 to 5, etc. For example the naïve encoding (5, 2, 3, 0, 7, 4) yields the compact encoding (5, 2, 2, 0, 3, 1):

1st: 0 1 2 3 4 5 6 7 8 --> 5
2nd: 0 1 2 3 4 - 5 6 7 --> 2
3rd: 0 1 - 2 3 - 4 5 6 --> 2
4th: 0 1 - - 2 - 3 4 5 --> 0
5th: - 0 - - 1 - 2 3 4 --> 3
6th: - 0 - - 1 - 2 - 3 --> 1


In my actual encoding, the number of pieces I want to encode is not fixed. The number of squares on the board however is.

The Question



How can I efficiently convert the naïve representation to the compact representation and vice versa? I use standard C99 for the program. In the context of this question, I am not interested in answers that use non-standard constructs, inline assembly or intrinsics.

Question Clarification



As there seems to be some confusion about the question:


  • The question is to find a practically efficient way to implement the conversion between the naïve and the compact position representations

  • Both representations are n-tuples of integers in certain ranges. The question is not about how to encode these representations into anything else.

  • In one of the cases I have, the number of squares is 25 and the number of pieces is up to 12. I am however interested in an implementation that works for a reasonable parameter space (e.g. up to 64 squares and up to 32 pieces).

  • I am not interested in alternative representations or encodings, especially representations or encodings that are not optimal.

  • Nor am I interested in remarks that the compact representation isn't worth the effort.

  • Nor am I interested in answers that use intrinsics, inline assembly or any other non-standard constructs (except perhaps those described by POSIX).


Answer

I have found a more elegant solution for up to 16 positions using 64-bit integers with a single loop for both encoding and decoding:

#include <stdio.h>
#include <stdlib.h>

void encode16(int dest[], int src[], int n) {
    unsigned long long state = 0xfedcba9876543210;
    for (int i = 0; i < n; i++) {
        int p4 = src[i] * 4;
        dest[i] = (state >> p4) & 15;
        state -= 0x1111111111111110 << p4;
    }
}

void decode16(int dest[], int src[], int n) {
    unsigned long long state = 0xfedcba9876543210;
    for (int i = 0; i < n; i++) {
        int p4 = src[i] * 4;
        dest[i] = (state >> p4) & 15;
        unsigned long long mask = ((unsigned long long)1 << p4) - 1;
        state = (state & mask) | ((state >> 4) & ~mask);
    }
}

int main(int argc, char *argv[]) {
    int naive[argc], compact[argc];
    int n = argc - 1;

    for (int i = 0; i < n; i++) {
        naive[i] = atoi(argv[i + 1]);
    }

    encode16(compact, naive, n);
    for (int i = 0; i < n; i++) {
        printf("%d ", compact[i]);
    }
    printf("\n");

    decode16(naive, compact, n);
    for (int i = 0; i < n; i++) {
        printf("%d ", naive[i]);
    }
    printf("\n");
    return 0;
}

The code uses 64-bit unsigned integers to hold arrays of 16 values in the range 0..15. Such an array can be updated in parallel in a single step, extracting a value is straightforward and deleting a value is a bit more cumbersome but still only a few steps.

You could extend this method to 25 positions using non-portable 128-bit integers (type __int128 is supported by both gcc and clang), encoding each position on 5 bits, taking advantage of the fact that 5 * 25 < 128, but the magical constants are more cumbersome to write.

Comments