Samin Khan - 2 months ago 16

C++ Question

I am writing a program that calculates the sum of every number up to 1000. For example, 1+2+3+4+5....+100. First, I assign summation jobs to 10 processors: Processor 0 gets 1-100, Processor 1 gets 101-200 and so on. The sums are stored in an array.

After all summations have been done parallelly, processors send their values to Processor 0 (Processor 0 receives values using nonblocking send/recv) and Processor 0 sums up all the values and displays the result.

Here is the code:

`#include <mpi.h>`

#include <iostream>

using namespace std;

int summation(int, int);

int main(int argc, char ** argv)

{

int * array;

int total_proc;

int curr_proc;

int limit = 0;

int partial_sum = 0;

int upperlimit = 0, lowerlimit = 0;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &total_proc);

MPI_Comm_rank(MPI_COMM_WORLD, &curr_proc);

MPI_Request send_request, recv_request;

/* checking if 1000 is divisible by number of procs, else quit */

if(1000 % total_proc != 0)

{

MPI_Finalize();

if(curr_proc == 0)

cout << "**** 1000 is not divisible by " << total_proc << " ...quitting..."<< endl;

return 0;

}

/* number of partial summations */

limit = 1000/total_proc;

array = new int [total_proc];

/* assigning jobs to processors */

for(int i = 0; i < total_proc; i++)

{

if(curr_proc == i)

{

upperlimit = upperlimit + limit;

lowerlimit = (upperlimit - limit) + 1;

partial_sum = summation(upperlimit, lowerlimit);

array[i] = partial_sum;

}

else

{

upperlimit = upperlimit + limit;

lowerlimit = (upperlimit - limit) + 1;

}

}

cout << "** Partial Sum From Process " << curr_proc << " is " << array[curr_proc] << endl;

/* send and receive - non blocking */

for(int i = 1; i < total_proc; i++)

{

if(curr_proc == i) /* (i = current processor) */

{

MPI_Isend(&array[i], 1, MPI_INT, 0, i, MPI_COMM_WORLD, &send_request);

cout << "-> Process " << i << " sent " << array[i] << " to Process 0" << endl;

MPI_Irecv(&array[i], 1, MPI_INT, i, i, MPI_COMM_WORLD, &recv_request);

//cout << "<- Process 0 received " << array[i] << " from Process " << i << endl;

}

}

MPI_Finalize();

if(curr_proc == 0)

{

for(int i = 1; i < total_proc; i++)

array[0] = array[0] + array[i];

cout << "Sum is " << array[0] << endl;

}

return 0;

}

int summation(int u, int l)

{

int result = 0;

for(int i = l; i <= u; i++)

result = result + i;

return result;

}

Output:

`** Partial Sum From Process 0 is 5050`

** Partial Sum From Process 3 is 35050

-> Process 3 sent 35050 to Process 0

<- Process 0 received 35050 from Process 3

** Partial Sum From Process 4 is 45050

-> Process 4 sent 45050 to Process 0

<- Process 0 received 45050 from Process 4

** Partial Sum From Process 5 is 55050

-> Process 5 sent 55050 to Process 0

<- Process 0 received 55050 from Process 5

** Partial Sum From Process 6 is 65050

** Partial Sum From Process 8 is 85050

-> Process 8 sent 85050 to Process 0

<- Process 0 received 85050 from Process 8

-> Process 6 sent 65050 to Process 0

** Partial Sum From Process 1 is 15050

** Partial Sum From Process 2 is 25050

-> Process 2 sent 25050 to Process 0

<- Process 0 received 25050 from Process 2

<- Process 0 received 65050 from Process 6

** Partial Sum From Process 7 is 75050

-> Process 1 sent 15050 to Process 0

<- Process 0 received 15050 from Process 1

-> Process 7 sent 75050 to Process 0

<- Process 0 received 75050 from Process 7

** Partial Sum From Process 9 is 95050

-> Process 9 sent 95050 to Process 0

<- Process 0 received 95050 from Process 9

Sum is -1544080023

Printing the contents of the array:

`5050`

536870912

-1579286148

-268433415

501219332

32666

501222192

32666

1

0

I'd like to know what is causing this.

If I print the array before MPI_Finalize is invoked it works fine.

Answer

The most important flaw your program has is how you divide the work. In MPI, **every process is executing the main function**. Therefore, you must ensure that all the processes execute your `summation`

function if you want them to collaborate on building the result.

You don't need the for loop. Every process is executing the main separately. They just have different `curr_proc`

values, and you can compute which portion of the job they have to perform based on that:

```
/* assigning jobs to processors */
int chunk_size = 1000 / total_proc;
lowerlimit = curr_proc * chunk_size;
upperlimit = (curr_proc+1) * chunk_size;
partial_sum = summation(upperlimit, lowerlimit);
```

Then, how the master process receives all the other processes' partial sum is not correct.

- MPI rank values (
`curr_proc`

) start form 0 up to`MPI_Comm_size`

output value (`total_proc-1`

). - Only the process #1 is sending/receiving data.
- You are using the immediate version of send and receive:
`MPI_Isend`

and`MPI_recv`

but you are not waiting until those requests are completed. You should use`MPI_Waitall`

for that purpose.

The correct version would be something like the following:

```
if( curr_proc == 0 ) {
// master process receives all data
for( int i = 1; i < total_proc; i++ )
MPI_Recv( &array[i], MPI_INT, 1, i, 0, MPI_COMM_WORLD );
} else {
// other processes send data to the master
MPI_Send( &partial_sum, MPI_INT, 1, 0, 0, MPI_COMM_WORLD );
}
```

This all-to-one communication pattern is known as *gather*. In MPI there is a function which already performs this functionality: `MPI_Gather`

.

Finally, what you intent to perform is known as *reduction*: take a given amount of numeric values and generate a single output value by continuously performing a single operation (a sum, in your case). In MPI there is a function which does that, too: `MPI_Reduce`

.

I strongly suggest you to do some basic guided exercises before trying to make your own. MPI is difficult to understand at the beginning. Building a good base is vital for you to be able to add complexity later on. A hands on tutorial is also a good way of getting started into MPI.

EDIT: Forgot to mention that **you don't need to enforce an even divission of the problem size** (1000 in this case) by the number of resources (`total_proc`

). Depending on the case, you can either assign the remainder to a single process:

```
chunk_size = 1000 / total_proc;
if( curr_proc == 0 )
chunk_size += 1000 % total_proc;
```

Or balance it as much as possible:

```
int remainder = curr_proc < ( 1000 % proc )? 1 : 0;
lowerlimit = curr_proc * chunk_size /* as usual */
+ curr_proc; /* cumulative remainder */
upperlimit = (curr_proc + 1) * chunk_size /* as usual */
+ remainder; /* curr_proc remainder */
```

The second case, the load unbalance will be as much as 1, while in the first case the load unbalance can reach `total_proc-1`

in the worst case.

Source (Stackoverflow)

Comments