Andreas Andreas - 17 days ago 4x
C++ Question

Why memcpy performance deteriorates when used in multible threads?

I wrote a short test program on Linux to test how memcpy performs when used in multiple threads. I didn't expect it to be as devastating. Execution time went from 3.8 seconds to over 2 minutes while running two instances of the program concurrently took about 4.7 seconds. Why is this?

// thread example
#include <iostream>
#include <thread>
#include <string.h>
using namespace std;

void foo(/*int a[3],int b[3]*/)
int a[3]={7,8,3};
int b[3]={9,8,2};

for(int i=0;i<100000000;i++){

int main()

#ifdef THREAD

thread threads[4];
for (char t=0; t<4; ++t) {
threads[t] = thread( foo );

for (auto& th : threads) th.join();
cout << "foo and bar completed.\n";




return 0;


Your memcpy does nothing as the 12 * rand() & 1 is always 0, because it is read as (12 * rand()) & 1. And since 12 is even, the result is always 0.

So you are simply measuring the time of rand(), but that function uses a shared global state that may (or may not) be shared by all the threads. It looks like in your implementation it is shared and its access is synchronized, so you have heavy contention and the performance suffers.

Try using rand_r() instead, that uses no shared state (or the new and improved C++ random generators):

  unsigned int r = 0;
  for(int i=0;i<100000000;i++){

In my machine, that reduces the multithread runtime from 30s to 0.7s (the single thread was 2.2s). Naturally, this experiment says nothing about memcpy(), but it says something about shared global state...