Elad Weiss - 10 months ago 49

C Question

Here is what I'm trying to acheive. It's simple enough:

`unsigned int foo1(bool cond, unsigned int num)`

{

return cond ? num : 0;

}

Assmebly:

`test dil, dil`

mov eax, 0

cmovne eax, esi

ret

My question is, is there a faster way to do it? Here are some ways I thought of:

`unsigned int foo2(bool cond, unsigned int num)`

{

return cond * num;

}

Assmbly:

`movzx eax, dil`

imul eax, esi

ret

`unsigned int foo3(bool cond, unsigned int num)`

{

static const unsigned int masks[2] = { 0x0, 0xFFFFFFFF };

return masks[cond] & num;

}

Assembly:

`movzx edi, dil`

mov eax, DWORD PTR foo3(bool, unsigned int)::masks[0+rdi*4]

and eax, esi

ret

`unsigned int foo4(bool cond, unsigned int num)`

{

return (0 - (unsigned)cond) & num;

}

Assembly:

`movzx eax, dil`

neg eax

and eax, esi

ret

Now, multiplication yields the least instructions, I think it's the best choice, but I'm not sure about the imul. Any suggestions?

Thanks in advance,

Answer Source

Afer viewing all wisening answers and comments,

**I believe this is the correct answer:**

When getting to such levels of micro-optimizatin, **there is no one 'best' choice**, as it may vary depending on platform, OS and the written software.

So, it seems to me the correct approach software-wise would be to create more than one implementation, and encapsulate them with some abstraction, so they can be easily switched.

When benchmarking, switch between them to see which one yields best results for the SITUATION.

Of course we can rule out solutions which are obviously worse than others.