BMC BMC - 1 month ago 5x
C++ Question

Does hardware consolidate multiple code operations into one physical CPU operation?

I've read a 2006 article about how the CPUs do operations on whole l1 cache lines even in situations when you only need to do something with a small fraction of what the l1 line contains(e.g. loading a whole l1 line to write to a Boolean variable is obviously overkill). The article encouraged optimization through managing the memory in a l1 cache friendly way.

Let's say I have two

variables that just happen to be consecutive in memory, and in my code I write to both of them consecutively.

Does the hardware consolidate my two code operations into one physical operation on a single l1 line(granted the CPU has a l1 cache line big enough to hold both variables), or not?

Is there some way to suggest such a thing to the CPU in C++ or C?

If the hardware doesn't do consolidation in any way, then do you think that it can yield better performance if such a thing is implemented in code? Allocating a memory block that is the size of the l1 line and populating it with as many hot-data variables as possible?


The size of the cache line is primarily relevant for concurrency. It is the smallest block of data that can be synchronized between multiple processors.

It is also as you suggest, necessary to load the whole cache line to do an operation on just a few bytes of it. If you do multuiple operations on the same processor though it won't need to continually reload it. It is actually cached, as the name suggests. This includes caching the writes to the data. So long as only one processor is accessing the data you can usually rest assured that it is doing it efficiently.

In cases where multiple processors access the data it may be helpful to align data. Using the C++ alignas attribute, or compiler extensions, can help you get data structures that are aligned in the way you want.

You may be interested in my article CPU Reordering – What is actually being reordered? which gives a few insights to what happens (at least logically) at the low level.