Greg Peckory Greg Peckory - 1 month ago 13
C Question

MPI Gather only gathering from Root Process

First off, I've been using this code as a reference, which shows the use of

MPI_Gather
without
MPI_Scatter
as that is what I am trying to achieve here. I've been working on this for a long time now and just can't figure out the issue. This sobel edge detection algorithm strengthens the outlines of objects inside images.

I will post my code below, as there is not too much, but I'll give a quick code description first.

I am trying to convert a sequential program into a parallel program. So all the non-MPI code is correct.

So there can only be a mistake with my
MPI
code somewhere.

int main(int argc, char **argv) {

FILE *inFile, *oFile;
int grayImage[N][N], edgeImage[N][N];
char type[2];
int w, h, max;
int r, g, b, y, x, i, j, sum, sumx, sumy;
int tid;

int GX[3][3], GY[3][3];
double elapsed_time;
struct timeval tv1, tv2;
int error = 0;
char buffer[BUFSIZ];
int rank, NP;

// Code lies here for reading from the image file and storing into the grayImage array.
// This works so I saw no reason to include it

/* 3x3 Sobel masks. */
GX[0][0] = -1; GX[0][1] = 0; GX[0][2] = 1;
GX[1][0] = -2; GX[1][1] = 0; GX[1][2] = 2;
GX[2][0] = -1; GX[2][1] = 0; GX[2][2] = 1;

GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1;
GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0;
GY[2][0] = -1; GY[2][1] = -2; GY[2][2] = -1;



MPI_Init(NULL, NULL);

MPI_Comm_size(MPI_COMM_WORLD, &NP);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

// This calculates the block size.MPI
// On 4 processors the block size for a 100x100 image would be 25x100 each

int blksz = (int)ceil((double)N/NP);

// This creates a local array for each processor, soon to be gathered

int tempEdge[blksz][N];

// this line shows it's working correctly

printf("processor %d, width: %d, height: %d, blksz: %d, begin: %d, end: %d\n", rank, w, h, blksz, rank*blksz, (rank+1)*blksz);

for(x=rank*blksz; x < (rank+1)*blksz && x<h; x++){

// Any code in this loop can be ignored as it works correctly.

for(y=0; y < w; ++y){

sumx = 0;
sumy = 0;
// handle image boundaries
if(x==0 || x==(h-1) || y==0 || y==(w-1))
sum = 0;
else{
//x gradient approx
for(i=-1; i<=1; i++) {
for(j=-1; j<=1; j++){
sumx += (grayImage[x+i][y+j] * GX[i+1][j+1]);
}
}
//y gradient approx
for(i=-1; i<=1; i++) {
for(j=-1; j<=1; j++){
sumy += (grayImage[x+i][y+j] * GY[i+1][j+1]);
}
}
//gradient magnitude approx
sum = (abs(sumx) + abs(sumy));
}
tempEdge[x][y] = clamp(sum);
}
}

// Here is the line I am guessing is causing the problem

MPI_Gather(&tempEdge, w*blksz, MPI_INT,
&edgeImage, w*blksz, MPI_INT, 0,
MPI_COMM_WORLD);


// Finally, I output edgeImage to a file here.

if(rank==0){

// output edgeImage to File

}

MPI_Finalize();

return 0;
}


The input image I am using is this:

enter image description here

But the output is only giving this:

enter image description here

As you can see it is only the top quarter (N/4), or
blksz
of the image.

This would imply that
MPI_Gather
is only gathering from process with rank 0?

I've been spending so much time on this, any help would be hugely appreciated!

Answer

Do not blame MPI collectives for bugs in the rest of your code. It is actually a miracle that your code produces broken images without segfaulting. Just take a look at that part:

int tempEdge[blksz][N];
             ~~~~~

for(x = rank*blksz; x < (rank+1)*blksz && x<h; x++){
        ~~~~~~~~~~
   for(y = 0; y < w; ++y){
      ...
      tempEdge[x][y] = clamp(sum);    (1)
               ~
   }
}

For any rank > 0 the code writes past the end of the array. Fix the statement at (1) to read:

tempEdge[x - rank*blksz][y] = clamp(sum);

Also, remove the &s in the MPI_Gather call:

MPI_Gather(tempEdge, w*blksz, MPI_INT,
           edgeImage, w*blksz, MPI_INT, 0,
           MPI_COMM_WORLD);

It will work with & too, but that is technically incorrect. If you insist on using the address-of operator, then use &tempEdge[0][0] and &edgeImage[0][0] instead.

Comments