Does the compiler simply check which variables are being modified between the lock and unlock statements and bind them to the mutex so there is exclusive access to them?
Or does a
m is a variable of type
Imagine this sequence:
int a; m.lock(); b += 1; a = b; m.unlock(); do_something_with(a);
There is an 'obvious' thing going on here:
The assignment of a from b and the increment of b is 'protected' from interference from other threads because other threads will attempt to lock the same
m, and will be blocked until we call
And there is a more subtle thing going on.
In single-threaded code, the compiler will seek to re-order loads and stores. Without the locks, the compiler would be free to effectively re-write your code if this turned out to be more efficient on your chipset:
int a = b + 1; // m.lock(); b = a; // m.unlock(); do_something_with(a);
std::future::get() and so on are fences. The compiler 'knows' that it may not reorder loads and stores (reads and writes) in such a way that the operation ends up on the other side of the fence from where you specified when you wrote the code.
1: 2: m.lock(); <--- this is a fence 3: b += 1; <--- so this load/store operation may not move above line 2 4: m.unlock(); <-- nor may it be moved below this line
Imagine what would happen if this wasn't the case:
thread1: int a = b + 1; <--- here another thread precedes us and executes the same block of code thread2: int a = b + 1; thread2: m.lock(); thread2: b = a; thread2: m.unlock(); thread1: m.lock(); thread1: b = a; thread1: m.unlock(); thread1:do_something_with(a); thread2:do_something_with(a);
if you follow it through, you'll see that b now has the wrong value in it, because the compiler was tying to make your code faster.
...and that's only the compiler optimisations.
std::mutex etc also prevents the memory caches from reordering loads and stores in a more 'optimal' way, which would be fine in a single-threaded environment but disastrous in a multi-core (i.e. any modern PC or phone) system.
There is a cost for this safety, because thread A's cache must be flushed before thread B reads the same data, and flushing caches to memory is hideously slow compared to cached memory access. But c'est la vie. It's the only way to make concurrent execution safe.
This is why we prefer that if possible, in an SMP system, each thread has its own copy of data on which to work. We want to minimise not only the time spent in a lock, but also the number of times we cross a fence.
I could go on to talk about the
std::memory_order modifiers, but that is a dark and dangerous hole, which experts often get wrong and in which beginners have no hope of getting it right.