I've read a 2006 article about how the CPUs do operations on whole l1 cache lines even in situations when you only need to do something with a small fraction of what the l1 line contains(e.g. loading a whole l1 line to write to a Boolean variable is obviously overkill). The article encouraged optimization through managing the memory in a l1 cache friendly way.
Let's say I have two
The size of the cache line is primarily relevant for concurrency. It is the smallest block of data that can be synchronized between multiple processors.
It is also as you suggest, necessary to load the whole cache line to do an operation on just a few bytes of it. If you do multuiple operations on the same processor though it won't need to continually reload it. It is actually cached, as the name suggests. This includes caching the writes to the data. So long as only one processor is accessing the data you can usually rest assured that it is doing it efficiently.
In cases where multiple processors access the data it may be helpful to align data. Using the C++
alignas attribute, or compiler extensions, can help you get data structures that are aligned in the way you want.
You may be interested in my article CPU Reordering – What is actually being reordered? which gives a few insights to what happens (at least logically) at the low level.