PaulH PaulH - 3 months ago 17
C Question

Do I need a memory barrier?

In the below C99 example, is the

buffer_full
flag guaranteed to be set (even with -O2 optimizations enabled) after the buffer is read or written to? Or, do I need a memory barrier to ensure correct ordering?

I expect this to be run on a system where aligned 32-bit reads and writes are atomic.

Assume only one instance of each thread is being run and no other threads are accessing
buffer
or
buffer_full
.

char buffer[100];
int buffer_full;

// write interesting data to the buffer. does not read.
void fill_buffer(char* buffer, size_t buffsz);
// read the interesting data in the buffer. does not write.
void use_buffer(const char* buffer, size_t buffsz);

void writer_thread()
{
if (!buffer_full) {
fill_buffer(buffer, sizeof(buffer));
// is a memory barrier needed here?
buffer_full = 1;
}
}

void reader_thread()
{
if (buffer_full) {
use_buffer(buffer, sizeof(buffer));
// is a memory barrier needed here?
buffer_full = 0;
}
}

Answer

I interpret you to be asking whether a compiler can reorder the assignments to buffer_full with the calls to fill_buffer() and read_buffer(). Such an optimization (and any optimization) is permitted only if it does not alter the externally-observable behavior of the program.

In this case, because buffer_full has external linkage, it is unlikely that the compiler can be confident about whether the optimization is permitted. It might be able to do so if the definitions of the fill_buffer() and use_buffer() functions, and of every function they themselves call, etc. are in the same translation unit with the writer_thread() and reader_thread() functions, but that depends somewhat on their implementations. If a conforming compiler is not confident that the optimization is allowed, then it must not perform it.

Inasmuch as your naming implies that the two functions will run in different threads, however, then without synchronization actions such as a memory barrier, you cannot be confident about the relative order in which one thread will perceive modifications to shared, non-_Atomic, non-volatile data performed by a different thread.

Moreover, if one thread writes a non-atomic variable and another thread accesses that same variable (read or write), then there is a data race unless a synchronization action or an atomic operation intervenes between the two in every possible overall order of operations. volatile variables don't really help here (see Why is volatile not considered useful in multithreaded C or C++ programming?). If you make buffer_full atomic, however, or if you implement your functions using atomic read and write operations on it, then that will serve to avoid data races involving not only that variable, but buffer as well (for your present code structure).