John London John London - 1 year ago 83
C++ Question

Efficient way to return stl vector by value from function

This is sort of an extension question from Efficient way to return a std::vector in c++

#include <cstdio>
#include <vector>
#include <chrono>

std::vector<int> func1() {
std::vector<int> v;
for (int i = 0; i < 1e6; i++) v.emplace_back(i);
return v;

std::vector<int> func2() {
std::vector<int> v;
for (int i = 0; i < 1e6; i++) v.emplace_back(i);
return std::move(v);

int main() {
auto start1 = std::chrono::steady_clock::now();
std::vector<int> v1 = func1();
auto end1 = std::chrono::steady_clock::now();
printf("%d\n", std::chrono::duration_cast<std::chrono::nanoseconds>(end1-start1).count());

auto start2 = std::chrono::steady_clock::now();
std::vector<int> v2 = func2();
auto end2 = std::chrono::steady_clock::now();
printf("%d\n", std::chrono::duration_cast<std::chrono::nanoseconds>(end2-start2).count());

auto start3 = std::chrono::steady_clock::now();
std::vector<int> v3 = v2;
auto end3 = std::chrono::steady_clock::now();
printf("%d\n", std::chrono::duration_cast<std::chrono::nanoseconds>(end3-start3).count());

return 0;

In method 2, I explicitly tells the compiler I want to move instead of copy the vector, but running the code multiple times shows that method 1 actually outperform method 2 sometimes, and even if method 2 wins, it is not by much.

Method 3 is consistently the best. How to emulate method 3 when I must return from function? (No, I cannot pass by reference)

Using gcc 6.1.0

Answer Source

Method 1 - you are using Named Return Value Optimization (NRVO). This is the best, actually, since no temporary objects are constructed at all in optimized code. If compiler is unable to make NRVO, it will use move semantics, same as in method 2.

Method 2 - you are effectively shutting down NRVO, and forcing move constructor for destination std::vector. So, this is not good, as method 1.

Method 3 - you are, actually, copying vector here, this is by far - WORST possible performance. But, since you copy vector in one pass (one large chunk of memory, instead of many emplaces) - you get the best performance, but this is not replicable in your use case in method 1 or 2.

How NRVO works? When you have only one return value: in this case this is std::vector<int> v, compiler is not creating this vector inside function at all. It creates unnamed rvalue vector, which you will return, and passes reference to it to your function.

Something like this will happen in optimized code:

std::vector<int> func1(std::vector<int>& hidden) {
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download