Vector9 - 9 months ago 40

C Question

I came across this algorithm submitted here:How does this algorithm to count the number of set bits in a 32-bit integer work?

I ran the algorithm submitted by "yeer" in the link above as both the algorithms more or less looked the same.

I wrote the traditional(slower method,assuming) code to check how much performance has improved.

Yeer Code:

`unsigned int number=0xFFFFFFFF;`

number= (number & 0x55555555) + ((number>>1) & 0x55555555);

number= (number & 0x33333333) + ((number>>2) & 0x33333333);

number= (number & 0x0F0F0F0F) + ((number>>4) & 0x0F0F0F0F);

number= (number & 0x00FF00FF) + ((number>>8) & 0x00FF00FF);

number= (number & 0x0000FFFF) + ((number>>16) & 0x0000FFFF);

printf("%d",number);

traditional way:

`unsigned int number=0xFFFFFFFF;`

unsigned char i=0;

unsigned char count=0;

for(i=0;i<32;i++)

{

if((number>>i) & 1)

count++;

}

printf("%d",count);

The second code outbeats "yeers" method.

For input value 0xFF(using variable as unsigned char), Traditional= 0.047s, Other= 0.063s

For input value 0xFFFFFFFF(using variable as unsigned int), Traditional= 0.141s, Other= 0.141s

Whats so special in the other algorithm?

I used Codeblocks IDE to run both the codes.

Answer

I ran a simple benchmark test for the diffeent codes. Each snippet was executed 100 million times. The code was compiled `gcc`

with no optmization flags. Each case was exceuted a couple of times to make sure the result was not distorted unduly by other system activity. The execution times varied less than 10% when re-run.

As you can see, the algorithm submitted by yeer is much faster than the other algorithms.

Test driver:

```
int main(int argc, char *argv[]) {
int i, result;
for (i= 0; i < 100000000; i++) {
result = SWAR(i);
}
return 0;
}
```

Code 1, the optimized yeer code:

```
int SWAR(unsigned int i) {
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
```

Time: 0.772s

Code 2, the un-optimized version:

```
int SWAR(unsigned int number) {
number= (number & 0x55555555) + ((number>>1) & 0x55555555);
number= (number & 0x33333333) + ((number>>2) & 0x33333333);
number= (number & 0x0F0F0F0F) + ((number>>4) & 0x0F0F0F0F);
number= (number & 0x00FF00FF) + ((number>>8) & 0x00FF00FF);
number= (number & 0x0000FFFF) + ((number>>16) & 0x0000FFFF);
return number;
}
```

Time: 1.241s

Code 3, bit counter without `if`

:

```
int SWAR(unsigned int number) {
int i, count = 0;
for(i=0;i<32;i++) {
count += (number>>i) & 1;
}
return count;
}
```

Time: 8.921s

Code 4, bit counter with `if`

:

```
int SWAR(unsigned int number) {
int i, count = 0;
for(i=0;i<32;i++) {
if ((number>>i) & 1) {
count++;
}
}
return count;
}
```

Time: 21.058s

Source (Stackoverflow)