marka_17 marka_17 - 2 years ago 55
C Question

Improving speed of multithreading program

I have a following simple problem : I need to create three threads and in every threads do certain operation. First thread need to add 100 to array[0] and subtract 101 from array[1], second thread need to add 200 to array[1] and subtract 201 from array[2] and in the end third thread need to add 300 to array[2] and subtract 301 from array[0].

Below is right solution but run time is very huge. If I carry out this task using one thread run time will be less 1 second but three thread increase run time to approximately 10 sec(+- 2 sec).
What is the problem? I thought that solution with three threads must be more fast. May be I use mutex in wrong way?

#include <stdio.h>
#include <pthread.h>
#include <limits.h>

enum {SIZE = 3, ITER = 1000000};
double array[SIZE] = {};
pthread_t threads[SIZE];

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void *func(void *arg) {
int n = *(int *)arg;
int tmp = 100 * (n + 1),
tmp2 = tmp + 1;
for (int i = 0; i != ITER; ++i) {
array[n % SIZE] += tmp;
array[(n + 1) % SIZE] -= tmp2;
return NULL;

int main() {
int n[SIZE];
for (int i = 0; i != SIZE; ++i) {
n[i] = i;
pthread_create(threads + i, NULL, func, n + i);

for (int i = 0; i != SIZE; ++i) {
pthread_join(threads[i], NULL);

for (int i = 0; i != SIZE; ++i) {
printf("%.10g ", array[i]);

return 0;

Answer Source

Mutex carry a high overhead. But more impotently, it serializes the execution. Only one thread can run at any point in time because only one can hold the mutex. So what you got is a serial execution with an extra synchronization overhead.

Instead, there are several approaches for this:

  1. You could partition your operations in a way that each thread access different indexes of the array. This way you could avoid using mutex at all. However, this will require a complete redesign of your solution.

  2. You could make each thread work on a local copy of the array, then combine the results of all the threads on a single thread. This is much simpler because all you have to do is copy the data to the threads and remove the mutex.

  3. You could use a mutex for each array index, but that seems a bit extreme because of the high memory overhead and mutex overhead when the collision probability is very low.

To conclude, using option 1 will yield the best performance, but using option 2 will yield a significant speedup compared to the serial version, with relatively little design effort.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download