Dr.Kameleon Dr.Kameleon - 1 month ago 12
C++ Question

"Isolate" specific Row/Column/Diagonal from a 64-bit number

OK, let's consider a 64-bit number, with its bits forming a 8x8 table.

E.g.


0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1
0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0


written as

a b c d e f g h
----------------
0 1 1 0 1 0 1 0
0 1 1 0 1 0 1 1
0 1 1 1 1 0 1 0
0 1 1 0 1 0 1 0
1 1 1 0 1 0 1 0
0 1 1 0 1 0 1 0
0 1 1 0 1 1 1 0
0 1 1 0 1 0 1 0


Now, what if we want to isolate JUST e.g. column d (
00100000
) (or any row/diagonal for that matter) ?

Can this be done? And if so, how?




HINTS :


  • (a) My main objective here - though not initially mentioned - is raw speed. I'm searching for the fastest algorithm around, since the "retrieval" function is being performed some millions of times per second.

  • (b) This is what comes closer to what I mean : http://chessprogramming.wikispaces.com/Kindergarten+Bitboards


Answer

Here's a solution with only 4 main steps:

const uint64_t column_mask = 0x8080808080808080ull;
const uint64_t magic = 0x2040810204081ull;

int get_col(uint64_t board, int col) {
    uint64_t column = (board << col) & column_mask;
    column *= magic;
    return (column >> 56) & 0xff;
}

It works like this:

  • the board is shifted to align the column with the left side
  • it's masked to only contain the required column (0..8)
  • it's multiplied by a magic number which results in all the original bits pushed to the left side
  • the left-most byte is shifted to the right

The magic number is chosen to copy only the needed bits and let the rest fall into unused places / overflow over the number. The process looks like this (digits are bit "IDs", rather than the number itself):

original column: ...1.......2.......3.......4.......5.......6.......7.......8....
aligned column:  1.......2.......3.......4.......5.......6.......7.......8.......
multiplied:      123456782345678.345678..45678...5678....678.....78......8.......
shifted to right:........................................................12345678

If you add the const keywords, assembly becomes quite nice actually:

get_col:
.LFB7:
        .cfi_startproc
        movl    %esi, %ecx
        movabsq $-9187201950435737472, %rax
        salq    %cl, %rdi
        andq    %rax, %rdi
        movabsq $567382630219905, %rax
        imulq   %rax, %rdi
        shrq    $56, %rdi
        movl    %edi, %eax
        ret

No branching, no external data, around 0.4ns per calculation.

Edit: takes around 6th of the time using NPE's solution as baseline (next fastest one)