Jaege Jaege - 3 months ago 6
C++ Question

Efficiency of postincrement v.s. preincrement in C++

I usually think that preincrement is more efficient than postincrement in C++. But when I read the book Game Engine Architecture(2nd ed.) recently, there is a section says that postincrement is prefered than preincrement in for loop. Because, as I quote, "preincrement introduces a data dependency into your code -- the CPU must wait for the increment operation to be completed before its value can be used in the expression." Is this true? (It is really subverted my idea about this problem.)

Here is the quote from the section in case you are interested:


5.3.2.1 Preincrement versus Postincrement

Notice in the above example that we are using C++’s postincrement operator,
p++
, rather than the preincrement operator,
++p
. This is a subtle but sometimes important optimization. The preincrement operator increments the contents of the variable before its (now modified) value is used in the expression. The postincrement operator increments the contents of the variable after it has been used. This means that writing
++p
introduces a data dependency into your code -- the CPU must wait for the increment operation to be completed before its value can be used in the expression. On a deeply pipelined CPU, this introduces a stall. On the other hand, with
p++
there is no data dependency. The value of the variable can be used immediately, and the increment operation can happen later or in parallel with its use. Either way, no stall is introduced into the pipeline.

Of course, within the “update” expression of a
for
loop (
for(init_expr;
test_expr; update_expr) { ... }
), there should be no difference between
pre- and postincrement. This is because any good compiler will recognize that
the value of the variable isn’t used in
update_expr
. But in cases where the
value is used, postincrement is superior because it doesn’t introduce a stall
in the CPU’s pipeline. Therefore, it’s good to get in the habit of always using
postincrement, unless you absolutely need the semantics of preincrement.


Edit: Add "the above example".

void processArray(int container[], int numElements)
{
int* pBegin = &container[0];
int* pEnd = &container[numElements];
for (int* p = pBegin; p != pEnd; p++)
{
int element = *p;
// process element...
}
}

void processList(std::list<int>& container)
{
std::list<int>::iterator pBegin = container.begin();
std::list<int>::iterator pEnd = container.end();
std::list<inf>::iterator p;
for (p = pBegin; p != pEnd; p++)
{
int element = *p;
// process element...
}
}

Answer

preincrement introduces a data dependency into your code -- the CPU must wait for the increment operation to be completed before its value can be used in the expression."

Is this true?

It is true - although perhaps overly strict. Pre increment doesn't necessarily introduce a data dependency - but it can.

A trivial example for exposition:

a = b++ * 2;

Here, the increment can be executed in parallel with the multiplication. The operands of both the increment and the multiplication are immediately available and do not depend on the result of either operation.

Another example:

a = ++b * 2;

Here, the multiplication must be executed after the increment, because one of the operands of the multiplication depends on the result of the increment.

Of course, these statements do slightly different things, so the compiler might not always be able to transform the program from one form to the other while keeping the semantics the same - which is why using the post increment might make a slight difference in performance.

A practical example, using a loop:

for(int i= 0; arr[i++];)
    count++;

for(int i=-1; arr[++i];)
    count++;

One might think that the latter is necessarily faster if they reason that "post-increment makes a copy" - which would have been very true in the case of non-fundamental types. However, due to the data dependency (and because int is a fundamental type with no overload function for increment operators), the former can theoretically be more efficient. Whether it is depends on the cpu architecture, and the ability of the optimizer.

For what it's worth - in a trivial program, on x86 arch, using g++ compiler with optimization enabled, the above loops had identical assembly output, so they are perfectly equivalent in that case.


Rules of thumb:

If the counter is a fundamental type and the result of increment is not used, then it makes no difference whether you use post/pre increment.

If the counter is not a fundamental type and the result of the increment is not used and optimizations are disabled, then pre increment may be more efficient. With optimizations enabled, there is no difference.

If the counter is a fundamental type and the result of increment is used, then post increment can theoretically be marginally more efficient - in some cpu architecture - in some context - using some compiler.

If the counter is a complex type and the result of the increment is used, then pre increment is typically faster than post increment. Also see R Sahu's answer regarding this case.

Comments