In Java, when we have two threads sharing the following variables:
volatile int b;
a = 5;
b = 6;
if(b == 6)
LoadLoad Barriers The sequence: Load1; LoadLoad; Load2 ensures that
Load1's data are loaded before data accessed by Load2 and all
subsequent load instructions are loaded. In general, explicit LoadLoad
barriers are needed on processors that perform speculative loads
and/or out-of-order processing in which waiting load instructions can
bypass waiting stores. On processors that guarantee to always preserve
load ordering, the barriers amount to no-ops.
I will give one example on how this is achieved. You can read more on the details here. For x86 processors as you indicated LoadLoad ends up being no-ops. In the article I linked Mark points out that
Doug lists the StoreStore, LoadLoad and LoadStore
So in essence the only barrier needed is a StoreLoad for x86 architectures. So how is this achieved on low level?
This is an excerpt from the blog:
Here's the code it generated for both volatile and non-volatile reads:
nop ;*synchronization entry mov 0x10(%rsi),%rax ;*getfield x
And for volatile writes:
xchg %ax,%ax movq $0xab,0x10(%rbx) lock addl $0x0,(%rsp) ;*putfield x
lock instruction is the StoreLoad as listed by Doug's cookbook. But the lock instruction also synchronizes all reads with other processes as listed
Locked instructions can be used to synchronize data written by one processor and read by another processor.
This reduces the overhead of having to issue LoadLoad LoadStore barriers for volatile loads.
All that being said, I will reiterate what assylias noted. The way it happens should not be important to a developer (if you are interested in processor/compiler implementer that is another story). The
volatile keyword is kind of an interface saying