Samin Khan Samin Khan - 2 months ago 16
C++ Question

MPI (Summation)

I am writing a program that calculates the sum of every number up to 1000. For example, 1+2+3+4+5....+100. First, I assign summation jobs to 10 processors: Processor 0 gets 1-100, Processor 1 gets 101-200 and so on. The sums are stored in an array.

After all summations have been done parallelly, processors send their values to Processor 0 (Processor 0 receives values using nonblocking send/recv) and Processor 0 sums up all the values and displays the result.

Here is the code:

#include <mpi.h>
#include <iostream>

using namespace std;

int summation(int, int);

int main(int argc, char ** argv)
{
int * array;
int total_proc;
int curr_proc;
int limit = 0;
int partial_sum = 0;
int upperlimit = 0, lowerlimit = 0;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &total_proc);
MPI_Comm_rank(MPI_COMM_WORLD, &curr_proc);
MPI_Request send_request, recv_request;

/* checking if 1000 is divisible by number of procs, else quit */
if(1000 % total_proc != 0)
{
MPI_Finalize();
if(curr_proc == 0)
cout << "**** 1000 is not divisible by " << total_proc << " ...quitting..."<< endl;
return 0;
}

/* number of partial summations */
limit = 1000/total_proc;

array = new int [total_proc];

/* assigning jobs to processors */
for(int i = 0; i < total_proc; i++)
{
if(curr_proc == i)
{
upperlimit = upperlimit + limit;
lowerlimit = (upperlimit - limit) + 1;
partial_sum = summation(upperlimit, lowerlimit);
array[i] = partial_sum;
}
else
{
upperlimit = upperlimit + limit;
lowerlimit = (upperlimit - limit) + 1;
}
}

cout << "** Partial Sum From Process " << curr_proc << " is " << array[curr_proc] << endl;

/* send and receive - non blocking */
for(int i = 1; i < total_proc; i++)
{
if(curr_proc == i) /* (i = current processor) */
{
MPI_Isend(&array[i], 1, MPI_INT, 0, i, MPI_COMM_WORLD, &send_request);
cout << "-> Process " << i << " sent " << array[i] << " to Process 0" << endl;

MPI_Irecv(&array[i], 1, MPI_INT, i, i, MPI_COMM_WORLD, &recv_request);
//cout << "<- Process 0 received " << array[i] << " from Process " << i << endl;
}
}

MPI_Finalize();

if(curr_proc == 0)
{
for(int i = 1; i < total_proc; i++)
array[0] = array[0] + array[i];
cout << "Sum is " << array[0] << endl;
}

return 0;
}

int summation(int u, int l)
{
int result = 0;
for(int i = l; i <= u; i++)
result = result + i;
return result;
}


Output:

** Partial Sum From Process 0 is 5050
** Partial Sum From Process 3 is 35050
-> Process 3 sent 35050 to Process 0
<- Process 0 received 35050 from Process 3
** Partial Sum From Process 4 is 45050
-> Process 4 sent 45050 to Process 0
<- Process 0 received 45050 from Process 4
** Partial Sum From Process 5 is 55050
-> Process 5 sent 55050 to Process 0
<- Process 0 received 55050 from Process 5
** Partial Sum From Process 6 is 65050
** Partial Sum From Process 8 is 85050
-> Process 8 sent 85050 to Process 0
<- Process 0 received 85050 from Process 8
-> Process 6 sent 65050 to Process 0
** Partial Sum From Process 1 is 15050
** Partial Sum From Process 2 is 25050
-> Process 2 sent 25050 to Process 0
<- Process 0 received 25050 from Process 2
<- Process 0 received 65050 from Process 6
** Partial Sum From Process 7 is 75050
-> Process 1 sent 15050 to Process 0
<- Process 0 received 15050 from Process 1
-> Process 7 sent 75050 to Process 0
<- Process 0 received 75050 from Process 7
** Partial Sum From Process 9 is 95050
-> Process 9 sent 95050 to Process 0
<- Process 0 received 95050 from Process 9
Sum is -1544080023


Printing the contents of the array:

5050
536870912
-1579286148
-268433415
501219332
32666
501222192
32666
1
0


I'd like to know what is causing this.

If I print the array before MPI_Finalize is invoked it works fine.

Answer

The most important flaw your program has is how you divide the work. In MPI, every process is executing the main function. Therefore, you must ensure that all the processes execute your summation function if you want them to collaborate on building the result.

You don't need the for loop. Every process is executing the main separately. They just have different curr_proc values, and you can compute which portion of the job they have to perform based on that:

/* assigning jobs to processors */
int chunk_size = 1000 / total_proc;
lowerlimit = curr_proc * chunk_size;
upperlimit = (curr_proc+1) * chunk_size;
partial_sum = summation(upperlimit, lowerlimit);

Then, how the master process receives all the other processes' partial sum is not correct.

  • MPI rank values (curr_proc) start form 0 up to MPI_Comm_size output value (total_proc-1).
  • Only the process #1 is sending/receiving data.
  • You are using the immediate version of send and receive: MPI_Isend and MPI_recv but you are not waiting until those requests are completed. You should use MPI_Waitall for that purpose.

The correct version would be something like the following:

if( curr_proc == 0 ) {
   // master process receives all data
   for( int i = 1; i < total_proc; i++ )
      MPI_Recv( &array[i], MPI_INT, 1, i, 0, MPI_COMM_WORLD );
} else {
   // other processes send data to the master
   MPI_Send( &partial_sum, MPI_INT, 1, 0, 0, MPI_COMM_WORLD );
}

This all-to-one communication pattern is known as gather. In MPI there is a function which already performs this functionality: MPI_Gather.

Finally, what you intent to perform is known as reduction: take a given amount of numeric values and generate a single output value by continuously performing a single operation (a sum, in your case). In MPI there is a function which does that, too: MPI_Reduce.

I strongly suggest you to do some basic guided exercises before trying to make your own. MPI is difficult to understand at the beginning. Building a good base is vital for you to be able to add complexity later on. A hands on tutorial is also a good way of getting started into MPI.

EDIT: Forgot to mention that you don't need to enforce an even divission of the problem size (1000 in this case) by the number of resources (total_proc). Depending on the case, you can either assign the remainder to a single process:

chunk_size = 1000 / total_proc;
if( curr_proc == 0 )
    chunk_size += 1000 % total_proc;

Or balance it as much as possible:

int remainder = curr_proc < ( 1000 % proc )? 1 : 0;
lowerlimit = curr_proc * chunk_size /* as usual */
           + curr_proc;             /* cumulative remainder */
upperlimit = (curr_proc + 1) * chunk_size /* as usual */
           + remainder;                   /* curr_proc remainder */

The second case, the load unbalance will be as much as 1, while in the first case the load unbalance can reach total_proc-1 in the worst case.

Comments