PooriR PooriR - 4 months ago 24
C Question

openmp parallel for schedule construct giving different answers for ever few program runs

I am trying to use openmp work sharing constructs. The code shared is a simpler example of what's going wrong with my bigger openmp code. I'm assigning values to an integer matrix, printing the matrix element values, initialising them to 0 and repeating it in a 't' loop. I'm counting the number of times the value assignments (done by parallel for) fail through the integer 'p'. p is supposed to be 0 if the code is correct, but it gives me different answers for different runs, so the work construct is failing somewhere. I had to run it around 12 times before I got the first wrong value of p as output (1, 2, 3, etc.)

The barrier directives in the code aren't really necessary, I was getting different values of p without it and thought an explicit barrier would help but I was wrong. This is the code:

#define NRA 10 /* number of rows in matrix A */
#define NCA 10 /* number of columns in matrix A */

int main()
{
int i, j, ir, p = 0, t;
int *a;
a = (int*) malloc(sizeof(int)*NRA*NCA);

omp_set_num_threads(5);

for(t=0;t<100000;t++)
{
#pragma omp barrier
#pragma omp parallel for schedule (static,2) collapse(2)
for(i=0;i<NRA;i++)
{
for(j=0;j<NCA;j++)
{
ir=j*NRA+i;
a[ir] = 1;
}
}

#pragma omp single
{
for(i=0;i<NRA;i++)
{
for(j=0;j<NCA;j++)
{
ir=j*NRA+i;
if(a[ir] != 1)
{
p += 1;
}
}
}
}

#pragma omp parallel for schedule (static,2) collapse(2)
for(i=0;i<NRA;i++)
{
for(j=0;j<NCA;j++)
{
ir=j*NRA+i;
a[ir] = 0;
}
}

# pragma omp barrier
}//end t

printf("p is %d\n",p);
}

Answer Source

The issue is a race condition on ir. Since it is defined outside of the loop, it is implicitly shared. You could force it to be private, but it is better to declare variables as locally as possible. That makes reasoning about OpenMP code much easier:

#pragma omp parallel for schedule (static,2) collapse(2)
for(int i=0;i<NRA;i++)
{
    for(int j=0;j<NCA;j++)
    { 
        int ir = j*NRA+i; 
        a[ir] = 1; 
    }
}

As commented by Jorge Bellón, there are other issues in your code with respect to redundant barriers and efficiency.