user2162550 user2162550 - 4 months ago 18
C Question

Using memory barriers to force in-order execution

Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like

Peterson
or
Deker
that requires no out-of-order execution, I have tested the following code that contains both SW barrier
asm volatile("": : :"memory")
and gcc builtin HW barrier
__sync_synchronize
:

#include <stdio.h>
int main(int argc, char ** argv)
{
int x=0;
asm volatile("": : :"memory");
__sync_synchronize();
x=1;
asm volatile("": : :"memory");
__sync_synchronize();
x=2;
asm volatile("": : :"memory");
__sync_synchronize();
x=3;
printf("%d",x);
return 0;
}


But the compilation output file is:

main:
.LFB24:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
mfence
mfence
movl $3, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
mfence
call __printf_chk
xorl %eax, %eax
addq $8, %rsp


And if I remove the barriers and compile again, I get:

main
.LFB24:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $3, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp


both compiled with
gcc -Wall -O2
in Ubuntu 14.04.1 LTS, x86.

The expected result was that the output file of the code that contains the memory barriers will contain all the assignments of the values I have in my source code, with
mfence
between them.

According to a related StackOverflow post -

gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory")


When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier


And later on:


However, when the CPU performes this code, it's permitted to reorder
the operations "under the hood", as long as it does not break memory
ordering model. This means that performing the operations can be done
out of order (if the CPU supports that, as most do these days). A HW
fence would have prevented that.


But as you can see, the only difference between the code with the memory barriers and the code without them is that the former one contains
mfence
in a way I was not expected to see it, and not all the assignments are included.

Why is the output file of the file with the memory barriers was not as I expected- Why does the
mfence
order has been altered? Why did the compiler remove some of the assignments? Is the compiler allowed to make such optimizations even if the memory barrier is applied and separates every single line of code?

References to the memory barrier types and usage:


a3f a3f
Answer

The memory barriers tell the compiler/CPU that instruction shouldn't be reordered across the barrier, they don't mean that writes that can be proved pointless have to be done anyway.

If you define your x as volatile, the compiler can't make the assumption, that it's the only entity that cares about xs value and has to follow the rules of the C abstract machine which is for the memory write to actually happen.

In you specific case you could then skip the barriers, because it's already guaranteed that volatile accesses aren't reordered against each other.

If you have C11 support, you are better off using _Atomics, which additionally can guarantee that normal assignments won't be reordered against your x and that the accesses are atomic.

Comments