phyxnj phyxnj - 3 months ago 8x
Linux Question

Why the c++ code run faster as it runs multiple times

#include <time.h>
#include <unistd.h>
#include <iostream>
using namespace std;

const int times = 1000;
const int N = 100000;

void run() {
for (int j = 0; j < N; j++) {

int main() {
clock_t main_start = clock();
for (int i = 0; i < times; i++) {
clock_t start = clock();
cout << "cost: " << (clock() - start) / 1000.0 << " ms." << endl;
cout << "total cost: " << (clock() - main_start) / 1000.0 << " ms." << endl;

Here is the example code. In the first 26 iterations of the timing loop, the
function costs about 0.4ms, but then the cost reduce to 0.2ms.

When the
is uncommented, the delay-loop takes 0.4ms for all runs, never speeding up. Why?

The code is compiled with
gcc -O0
(no optimization), so the delay loop isn't optimized away. It's run on
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
, with
3.13.0-32-generic Ubuntu 14.04.1 LTS


My guess: after 26 iterations, Linux ramps the CPU up to max clock speed since your process uses its full timeslice a couple times in a row.

If you checked with perf counters instead of wall-clock time, you'd see that the core clock cycles per delay-loop stayed constant, confirming that it's just an effect of DVFS (which all modern CPUs use to run at a more energy-efficient frequency and voltage most of the time).

If you tested on a Skylake with kernel support for the new power-management mode (where the hardware takes full control of the clock speed), ramp-up would happen much faster.

If you leave it running for a while on an Intel CPU with Turbo, you'll probably see the time per iteration increase again slightly once thermal limits require the clock speed to reduce back down to max sustained frequency.

Introducing a usleep prevents Linux's CPU frequency governor from ramping up the clock speed, because the process isn't generating 100% load even at minimum frequency. (i.e. the kernel's heuristic decides that the CPU is running fast enough for the workload that's running on it.)