Tomáš Zato Tomáš Zato - 1 month ago 12
C Question

Where is memory for x++ increment allocated?

I've been just explaining

i++
vs
++i
details to a friend. I was telling him how with no optimalization,
i++
in
for
loop essentially means making a copy of your
i
that is not used for anything. Since
i++
can be described with this pseudocode:

tmp = i;
i = i + 1;
return tmp;


Well, I noticed I just don't really know one thing: where is the memory for our
tmp
allocated? Does it increase the memory size required for whole procedure/function? (That is, is it on stack?)

I suppose it is, but how to test that? If and only if it matters we're talking about C99 standard and GCC compiler. But I'd prefer broader answer to get some perspective on the matter.

Answer

Your assumption that compilers always produce different results for ++i and i++ without optmization is false. Here's a look at pre and post increment on godbolt, in gcc 6.2, no optimization:

The C Code

int pre() {
  int i = 0;
  ++i;
}

int post() {
  int i= 0;
  i++;
}

The Assembly (x86-64)

pre():
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 0
        add     DWORD PTR [rbp-4], 1
        nop
        pop     rbp
        ret
post():
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 0
        add     DWORD PTR [rbp-4], 1
        nop
        pop     rbp
        ret

Note that the compiled code is byte-for-byte identical here for i++ and ++i. Both simply add 1 to the memory location reserved on the stack for i. No temporary is created or needed.

You might complain that I'm not actually using the value of the incremented expression, so let's look at something that actually does use the value:

The C Code

int pre(int i) {
  return ++i;
}

int post(int i) {
  return i++;
}

The Assembly (x86-64)

pre(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        add     DWORD PTR [rbp-4], 1
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret

post(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     eax, DWORD PTR [rbp-4]
        lea     edx, [rax+1]
        mov     DWORD PTR [rbp-4], edx
        pop     rbp
        ret

Here, the assembly is different. The pre-increment version uses a memory RMW instruction to increment the variable, while the post-increment version increments the variable separately through edx. While looking at un-optimized code is always an exercise in futility, I'm quite sure the post-increment version is faster here, as the dependency chain is smaller due to no RMW instruction in the critical path and subsequent store forwarding stall.

A key note is that even here there is no "temporary space" allocated in memory - only the assembly changes, and a register (eax here) is used for free as the resting place for the value of i before the post-increment.

Of course, you shouldn't really read anything into unoptimized code. It isn't going to be used in practice and you can't really learn much about the efficiency of any construct by studying it, because the optimized code will vary wildly across different idioms.