Anne Quinn Anne Quinn - 2 months ago 9
C++ Question

Cost of dereferencing a variable on the stack, or dereferenced recently?

With a modern compiler, is it as expensive to dereference a pointer a second time, when the data it points to was dereferenced recently?

int * ptr = new int();
... lots of stuff...
*ptr = 1; // may need to load the memory into the cpu
*ptr = 2; // accessed again, can I assume this will usually be loaded and cost nothing extra?


What if the pointer addresses a variable on the stack, can I assume reading/writing through a pointer to a stack variable costs the same as reading/writing directly to the variable?

int var;
int * ptr = &var;

*ptr = 0; // will this cost the same as if I just said var = 0; ?


And finally, does this extend to more complicated things, such as manipulating a base object on the stack through an interface?

Base baseObject;
Derived * derivedObject = &baseObject;
derivedObject->value = 42; // will this have the same cost as if I just--
derivedObject->doSomething() // --manipulated baseObject directly?


Edit: I'm asking this to gain a deeper understanding; this is less a problem to be solved than it is a request for insight. Please don't worry about "premature-optimization" or other practical concerns, just give me all the rope you can :)

Answer

This question contains a number of ambiguities.

A simple rule of thumb is that dereferencing something will always have the same cost, except when it doesn't.

There are a number of factors in the cost of a dereference - is the destination in cache, is it paged and the code generated by the compiler.

For the code snippet

Obj* p = new Obj;
// <elided> //
p->something = 1;

looking at this source code we can't tell whether the executable will have ps value loaded, whether *p is in cache or whether *p has even been accessed.

Obj* p = new Obj;
p->something = 1;

We still can't be sure whether *p is paged/cached, but most modern compilers/optimizers will not emit code that retrieves p and stores it and then fetches it again.

In practice on modern hardware, you really shouldn't be concerned with it, and if you are, start by looking at the assembly.

I'll use two ends of the spectrum:

struct Obj { int something; int other; };

Obj* f() {
  Obj* p = new Obj;
  p->something = 1;
  p->other = 2;
  return p;
}

extern void fn2(Obj**);

Obj* h() {
  Obj* p = new Obj;
  fn2(&p);
  p->something = 1;
  fn2(&p);
  p->other = 2;
  return p;
}

This produces

f():
        subq    $8, %rsp
        movl    $8, %edi
        call    operator new(unsigned long)
        movl    $1, (%rax)
        movl    $2, 4(%rax)
        addq    $8, %rsp
        ret

and

h():
        subq    $24, %rsp
        movl    $8, %edi
        call    operator new(unsigned long)
        leaq    8(%rsp), %rdi
        movq    %rax, 8(%rsp)
        call    fn2(Obj**)
        movq    8(%rsp), %rax
        leaq    8(%rsp), %rdi
        movl    $1, (%rax)
        call    fn2(Obj**)
        movq    8(%rsp), %rax
        movl    $2, 4(%rax)
        addq    $24, %rsp
        ret

Here the compiler has to preserve and restore the pointer to dereference it after the call, but that's a bit unfair because the pointer could be modified by the called function.

Obj* h() {
  Obj* p = new Obj;
  fn2(nullptr);
  p->something = 1;
  fn2(nullptr);
  p->other = 2;
  return p;
}

produces

h():
        pushq   %rbx
        movl    $8, %edi
        call    operator new(unsigned long)
        xorl    %edi, %edi
        movq    %rax, %rbx
        call    fn2(Obj**)
        xorl    %edi, %edi
        movl    $1, (%rbx)
        call    fn2(Obj**)
        movq    %rbx, %rax
        movl    $2, 4(%rbx)
        popq    %rbx
        ret

we're still seeing some register shenanigans, but it's hardly expensive.

As for your questions about pointers to the stack, a good optimizer will be able to eliminate those, but again you have to consult the assembly generated by your chosen compiler for your particular platform.

struct Obj { int something; int other; };

void fn(Obj*);

void f()
{
  Obj o;
  Obj* p = &o;
  p->something = 1;
  p->other = 1;
  fn(p);
}

produces the following where p has basically been eliminated.

f():
        subq    $24, %rsp
        movq    %rsp, %rdi
        movl    $1, (%rsp)
        movl    $1, 4(%rsp)
        call    fn(Obj*)
        addq    $24, %rsp
        ret

Of course, if we passed &p to something, the compiler wouldn't be able to elide it entirely, but it still might be smart enough to avoid using it when it didn't absolutely have to.