Narek Atayan Narek Atayan - 9 months ago 43
C++ Question

How can compiler optimizations affect on a cache-unfriendly algorithm?

I've noticed an interesting behavior in the code of this question which is also comes from Agner Fog in Optimizing software in C++ and it reduces to how data is accessed and stored in the cache (cache associativity). The explanations is clear for me, but then someone pings about

volatile
...
That is if we add
volatile
qualifier to the matrix declaration:
volatile int mat[MATSIZE][MATSIZE];
the running time for value
512
dramatically decreases: 2144 → 1562 μs.

As we know
volatile
prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of a program.

One possible version assumes that the computation process happens only in RAM and no cpu caches is used in the case of
volatile
. But on the other hand the run-time for value
513
again is less than for
512
:
1490 μs
...

Answer Source

Unfortunately I cannot confirm that the volatile version is running faster. I did a test run of both the volatile and the non-volatile version, and the time comparison of both can be seen in the chart below. When measuring performance in order for optimizing code it is always important to take several steps (not just one or two, but in the range of thousands) and take the mode (https://en.wikipedia.org/wiki/Mode_(statistics)) of the gathered data as per Alexandrescu's Fastware.

enter image description here

There are various peaks, and deep valleys, but looking at the chart you cannot conclude that the volatile is faster.

Indeed, the code generated by the compilers is different, but not to such an extent, you can check it out on https://godbolt.org/g/ILw3tg