jdi jdi - 3 months ago 13
C++ Question

Does const ref lvalue to non-const func return value specifically reduce copies?

I have encountered a C++ habit that I have tried to research in order to understand its impact and validate its usage. But I can't seem to find the exact answer.

std::vector< Thing > getThings();

void do() {
const std::vector< Thing > &things = getThings();
}


Here we have some function that returns a non-
const&
value. The habit I am seeing is the usage of a
const&
lvalue when assigning the return value from the function. The proposed reasoning for this habit is that it reduces a copy.

Now I have been researching RVO (Return Value Optimization), copy elision, and C++11 move semantics. I realize that a given compiler could choose to prevent a copy via RVO regardless of the use of
const&
here. But does the usage of a
const&
lvalue here have any kind of effect on non-
const&
return values in terms of preventing copies? And I am specifically asking about pre-C++11 compilers, before move semantics.

My assumption is that either the compiler implements RVO or it does not, and that saying the lvalue should be
const&
doesn't hint or force a copy-free situation.

Edit

I am specifically asking about whether
const&
usage here reduces a copy, and not about the lifetime of the temporary object, as described in "the most important const"

Further clarification of question

Is this:

const std::vector< Thing > &things = getThings();


any different than this:

std::vector< Thing > things = getThings();


in terms of reducing copies? Or does it not have any influence on whether the compiler can reduce copies, such as via RVO?

Answer

Hey so your question is:

"When a function returns a class instance by value, and you assign it to a const reference, does that avoid a copy constructor call?"

Ignoring the lifetime of the temporary, as that’s not the question you’re asking, we can get a feel for what happens by looking at the assembly output. I’m using clang, llvm 7.0.2.

Here’s something box standard. Return by value, nothing fancy.

Test A

class MyClass
{
public:
    MyClass();
    MyClass(const MyClass & source);
    long int m_tmp;
};

MyClass createMyClass();

int main()
{
    const MyClass myClass = createMyClass();
    return 0;
}

If I compile with “-O0 -S -fno-elide-constructors” I get this.

_main:
    pushq   %rbp                    # Boiler plate
    movq    %rsp, %rbp              # Boiler plate
    subq    $32, %rsp               # Reserve 32 bytes for stack frame
    leaq    -24(%rbp), %rdi         # arg0 = &___temp_items = rdi = rbp-24
    movl    $0, -4(%rbp)            # rbp-4 = 0, no idea why this happens
    callq   __Z13createMyClassv     # createMyClass(arg0)
    leaq    -16(%rbp), %rdi         # arg0 = & myClass
    leaq    -24(%rbp), %rsi         # arg1 = &__temp_items
    callq   __ZN7MyClassC1ERKS_     # MyClass::MyClass(arg0, arg1)
    xorl    %eax, %eax              # eax = 0, the return value for main
    addq    $32, %rsp               # Pop stack frame
    popq    %rbp                    # Boiler plate
    retq

We are looking at only the calling code. We’re not interested in the implementation of createMyClass. That’s compiled somewhere else. So createMyClass creates the class inside a temporary and then that gets copied into myClass.

Simples.

What about the const ref version ?

Test B

class MyClass
{
public:
    MyClass();
    MyClass(const MyClass & source);
    long int m_tmp;
};

MyClass createMyClass();

int main()
{
    const MyClass & myClass = createMyClass();
    return 0;
}

Same compiler options.

_main:                              # Boiler plate
    pushq   %rbp                    # Boiler plate
    movq    %rsp, %rbp              # Boiler plate
    subq    $32, %rsp               # Reserve 32 bytes for the stack frame
    leaq    -24(%rbp), %rdi         # arg0 = &___temp_items = rdi = rbp-24
    movl    $0, -4(%rbp)            # *(rbp-4) = 0, no idea what this is for
    callq   __Z13createMyClassv     # createMyClass(arg0)
    xorl    %eax, %eax              # eax = 0, the return value for main
    leaq    -24(%rbp), %rdi         # rdi = &___temp_items
    movq    %rdi, -16(%rbp)         # &myClass = rdi = &___temp_items;
    addq    $32, %rsp               # Pop stack frame
    popq    %rbp                    # Boiler plate
    retq

No copy constructor and therefore more optimal right ?

What happens if we turn off “-fno-elide-constructors” for both versions? Still keeping -O0.

Test A

_main:
    pushq   %rbp                    # Boiler plate
    movq    %rsp, %rbp              # Boiler plate
    subq    $16, %rsp               # Reserve 16 bytes for the stack frame
    leaq    -16(%rbp), %rdi         # arg0 = &myClass = rdi = rbp-16
    movl    $0, -4(%rbp)            # rbp-4 = 0, no idea what this is
    callq   __Z13createMyClassv     # createMyClass(arg0)
    xorl    %eax, %eax              # eax = 0, return value for main
    addq    $16, %rsp               # Pop stack frame
    popq    %rbp                    # Boiler plate
    retq

Clang has removed the copy constructor call.

Test B

_main:                              # Boiler plate
    pushq   %rbp                    # Boiler plate
    movq    %rsp, %rbp              # Boiler plate
    subq    $32, %rsp               # Reserve 32 bytes for the stack frame
    leaq    -24(%rbp), %rdi         # arg0 = &___temp_items = rdi = rbp-24
    movl    $0, -4(%rbp)            # rbp-4 = 0, no idea what this is
    callq   __Z13createMyClassv     # createMyClass(arg0)
    xorl    %eax, %eax              # eax = 0, return value for main
    leaq    -24(%rbp), %rdi         # rdi = &__temp_items
    movq    %rdi, -16(%rbp)         # &myClass = rdi
    addq    $32, %rsp               # Pop stack frame
    popq    %rbp                    # Boiler plate
    retq

Test B (assign to const reference) is the same as it was before. It now has more instructions than Test A.

What if we set optimisation to -O1 ?

_main:
    pushq   %rbp                    # Boiler plate
    movq    %rsp, %rbp              # Boiler plate
    subq    $16, %rsp               # Reserve 16 bytes for the stack frame
    leaq    -8(%rbp), %rdi          # arg0 = &___temp_items = rdi = rbp-8
    callq   __Z13createMyClassv     # createMyClass(arg0)
    xorl    %eax, %eax              # ex = 0, return value for main
    addq    $16, %rsp               # Pop stack frame
    popq    %rbp                    # Boiler plate
    retq

Both source files turn into this when compiled with -O1. They result in exactly the same assembler. This is also true for -O4.

The compiler doesn’t know about the contents of createMyClass so it can’t do anything more to optimise.

With the compiler I'm using, you get no performance gain from assigning to a const ref.

I imagine it's a similar situation for g++ and intel although it's always good to check.