Vector9 Vector9 - 2 months ago 7
C Question

"Counting number of bits set"- why this algorithm?

I came across this algorithm submitted here:How does this algorithm to count the number of set bits in a 32-bit integer work?

I ran the algorithm submitted by "yeer" in the link above as both the algorithms more or less looked the same.

I wrote the traditional(slower method,assuming) code to check how much performance has improved.

Yeer Code:

unsigned int number=0xFFFFFFFF;

number= (number & 0x55555555) + ((number>>1) & 0x55555555);
number= (number & 0x33333333) + ((number>>2) & 0x33333333);
number= (number & 0x0F0F0F0F) + ((number>>4) & 0x0F0F0F0F);
number= (number & 0x00FF00FF) + ((number>>8) & 0x00FF00FF);
number= (number & 0x0000FFFF) + ((number>>16) & 0x0000FFFF);

printf("%d",number);


traditional way:

unsigned int number=0xFFFFFFFF;
unsigned char i=0;
unsigned char count=0;

for(i=0;i<32;i++)
{
if((number>>i) & 1)
count++;
}

printf("%d",count);


The second code outbeats "yeers" method.

For input value 0xFF(using variable as unsigned char), Traditional= 0.047s, Other= 0.063s
For input value 0xFFFFFFFF(using variable as unsigned int), Traditional= 0.141s, Other= 0.141s

Whats so special in the other algorithm?

I used Codeblocks IDE to run both the codes.

Answer

I ran a simple benchmark test for the diffeent codes. Each snippet was executed 100 million times. The code was compiled gcc with no optmization flags. Each case was exceuted a couple of times to make sure the result was not distorted unduly by other system activity. The execution times varied less than 10% when re-run.

As you can see, the algorithm submitted by yeer is much faster than the other algorithms.

Test driver:

int main(int argc, char *argv[]) {
    int i, result;
    for (i= 0; i < 100000000; i++) {
        result = SWAR(i);
    }
    return 0;
}

Code 1, the optimized yeer code:

int SWAR(unsigned int i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

Time: 0.772s

Code 2, the un-optimized version:

int SWAR(unsigned int number) {
    number= (number & 0x55555555) + ((number>>1) & 0x55555555);
    number= (number & 0x33333333) + ((number>>2) & 0x33333333);
    number= (number & 0x0F0F0F0F) + ((number>>4) & 0x0F0F0F0F);
    number= (number & 0x00FF00FF) + ((number>>8) & 0x00FF00FF);
    number= (number & 0x0000FFFF) + ((number>>16) & 0x0000FFFF);
    return number;
}

Time: 1.241s

Code 3, bit counter without if:

int SWAR(unsigned int number) {
  int i, count = 0;
  for(i=0;i<32;i++) {
    count += (number>>i) & 1;
  }
    return count;
}

Time: 8.921s

Code 4, bit counter with if:

int SWAR(unsigned int number) {
  int i, count = 0;
  for(i=0;i<32;i++) {
    if ((number>>i) & 1) {
        count++;
    }
  }
    return count;
}

Time: 21.058s