neverlastn neverlastn - 2 months ago 8
C++ Question

Are atomic types necessary in multi-threading? (OS X, clang, c++11)

I'm trying to demonstrate that it's very bad idea to not use

std::atomic<>
s but I can't manage to create an example that reproduces the failure. I have two threads and one of them does:

{
foobar = false;
}


and the other:

{
if (foobar) {
// ...
}
}


the type of
foobar
is either
bool
or
std::atomic_bool
and it's initialized to
true
. I'm using OS X Yosemite and even tried to use this trick to hint via CPU affinity that I want the threads to run on different cores. I run such operations in loops etc. and in any case, there's no observable difference in execution. I end up inspecting generated assembly with clang
clang -std=c++11 -lstdc++ -O3 -S test.cpp
and I see that the asm differences on read are minor (without atomic on left, with on right):

enter image description here

No
mfence
or something that "dramatic". On the write side, something more "dramatic" happens:

enter image description here

As you can see, the
atomic<>
version uses
xchgb
which uses an implicit lock. When I compile with a relatively old version of gcc (v4.5.2) I can see all sorts of
mfence
s being added which also indicates there's a serious concern.

I kind of understand that "X86 implements a very strong memory model" (ref) and that
mfence
s might not be necessary but does it mean that unless I want to write cross-platform code that e.g. supports ARM, I don't really need to put any
atomic<>
s unless I care for consistency at ns-level?

I've watched "atomic<> Weapons" from Herb Sutter but I'm still impressed with how difficult it is to create a simple example that reproduces those problems.

Answer

The big problem of data races is that they're undefined behavior, not guaranteed wrong behavior. And this, in conjunction with the the general unpredictability of threads and the strength of the x64 memory model, means that it gets really hard to create reproduceable failures.

A slightly more reliable failure mode is when the optimizer does unexpected things, because you can observe those in the assembly. Of course, the optimizer is notoriously finicky as well and might do something completely different if you change just one code line.

Here's an example failure that we had in our code at one point. The code implemented a sort of spin lock, but didn't use atomics.

bool operation_done;
void thread1() {
  while (!operation_done) {
    sleep();
  }
  // do something that depends on operation being done
}
void thread2() {
  // do the operation
  operation_done = true;
}

This worked fine in debug mode, but the release build got stuck. Debugging showed that execution of thread1 never left the loop, and looking at the assembly, we found that the condition was gone; the loop was simply infinite.

The problem was that the optimizer realized that under its memory model, operation_done could not possibly change within the loop (that would have been a data race), and thus it "knew" that once the condition was true once, it would be true forever.

Changing the type of operation_done to atomic_bool (or actually, a pre-C++11 compiler-specific equivalent) fixed the issue.

Comments