Draxent Draxent - 2 months ago 13
C++ Question

C++ Optimize if/else statement

I wrote the following code to simulate and simplify what is happening in my application.

In this simplification, I have the if and else branch that are executing the same code, but writing in a different portion of memory. So I thought to use an array of two entries, and depending the statement condition the first or the second entry is updated.

This solution brings the expected speed up.

But when during the execution we have a random access for each iteration, the improvement almost disappears. To show this bizarre behavior, I have used the template to activate or deactivate the use of if-statement and random access, i.e..

useif: true when the if-statement is used, false when the memory access is used.

rand_access: true when we have a random access for each iteration, false otherwise.

#include <chrono>
#include <iostream>
#include <vector>
#include <numeric>
#include <algorithm>
#define N 1000000000
using namespace std;
using namespace std::chrono;

template <bool useif, bool rand_access>
void exec(vector<int>& V, vector<bool>& B) {
auto start = high_resolution_clock::now();
int sum[2], sum1 = 0, sum2 = 0;
sum[0] = 0; sum[1] = 0;
for ( int i = 0; i < N; i++ ) {
const int index = (rand_access) ? V[i] : i;
if ( useif ){
if ( B[index] ) sum2 += V[i];
else sum1 += V[i];
} else
sum[B[index]] += V[i];
}
auto t = std::chrono::duration_cast<milliseconds>(high_resolution_clock::now() - start);
std::cout << "Time useif="<<useif<<", rand_access="<<rand_access<<" : " << t.count() << " ms" << std::endl;
std::cout << (sum1+sum2+sum[0]+sum[1]) << std::endl;
}

int main() {
vector<int> V(N);
vector<bool> B(N, false);
iota( V.begin(), V.end(), 0 );
random_shuffle( V.begin(), V.end() );
fill( B.begin(), B.begin() + B.size()/2, true);
random_shuffle( B.begin(), B.end() );
exec<false, false>(V, B);
exec<false, true>(V, B);
exec<true, false>(V, B);
exec<true, true>(V, B);
return 0;
}


On my machine, compiling with g++ --std=c++11 -O3 -march=native -mtune=native, I obtain the following results:

Time useif=0, rand_access=0 : 1518 ms

Time useif=0, rand_access=1 : 10791 ms

Time useif=1, rand_access=0 : 4384 ms

Time useif=1, rand_access=1 : 12214 ms

So, there is a speed up of 2.8 substituting the if-statement with a memory access, when there is NOT a random access involved, otherwise the performance are really close (1.1 speed up).

I don't understand why this is happening and how can I deal with it, i.e. how can I optimize the if-statement knowing that the if and else branch are executing the same code?

Answer

Your optimization of if/else with an array is the correct one. It always gives you an improvement, but the significance of the improvement depends on other factors as well.

Your experiment shows relative impacts of branch elimination and cache access optimization.

When the code accesses memory in order, it takes advantage of CPU cache optimization due to locality of reference, "paying" for only a small fraction of its memory accesses. With 64-byte cache lines, it acts like an incredible "buy one, get fifteen free" policy for 4-byte integers stored in consecutive locations. It lets your CPU keep on adding with very little wait for the data from memory.

When the code has no branching, it takes advantage of CPU instruction pipeline. Hitting an if with a condition that is hard to predict stalls the pipeline, so fewer instructions are "in flight" at the same time.

Going from random access with branching to sequential access with branching saves you 7.8 seconds; eliminating branching saves you an additional 2.8 seconds on top of that.

In contrast, eliminating branching without sequential access gives you only a 1.5 seconds improvement, because eliminating pipeline stalls becomes a lot less important when the CPU is waiting for memory anyway.

Comments