zerge zerge - 1 month ago 8
C++ Question

C++ OpenMP computation errors with private and shared clause

I have a

for
loop to be parallelized with OpenMP, but there are multiple computational errors, probably due to my lack of understanding of the concept of multithreading with OpenMP:

for ( int i = -X/2; i < X/2; ++i )
{
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
}


This works fine, then I made the following changes:

#pragma omp parallel for shared (buffer, response) private(base, temp)
for ( int i = -X/2; i < X/2; ++i )
{
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
}


In this code, neither
buffer.y
nor
response
will have the correct values. In my understanding, every single thread should have an own copy of
base.y
and
temp
, they are only temporary variables for the computation, and
buffer
and
response
must be shared (they will store the computed data), but this does not work as I would expect.

The only version that is perfect is the following, but obviously, there is no performance increase:

omp_lock_t writelock;
omp_init_lock(&writelock);
omp_set_num_threads (4);

#pragma omp parallel for
for ( int i = -X/2; i < X/2; ++i )
{
omp_set_lock(&writelock);
base.y = anchor + i*rho_step;
temp = some_function( base );
if( temp > response )
{
buffer.y = base.y;
response = temp;
}
omp_unset_lock(&writelock);
}
omp_destroy_lock(&writelock);


What can be the problem? (
anchor
and
rho_step
are constants in this loop)

Answer Source

In order to get your code to deal with the trans-thread of the buffer and response variables, you'll need to use some per-thread local variables for them, and perform a final reduction with them to update their shared counterparts.

Here is what it would look like (not tested):

#pragma omp parallel firstprivate( base )
{
    auto localResponse = response;
    auto localBuffer = buffer;
    #pragma omp for
    for ( int i = -X/2; i < X/2; ++i )
    {
        base.y = anchor + i * rho_step;
        auto temp = some_function( base );
        if ( temp > localResponse )
        {
            localBuffer.y = base.y;
            localResponse = temp;
        }
    }
    #pragma omp critical
    {
        if ( localResponse > response )
        {
            buffer.y = localBuffer.y;
            response = localResponse;
        }
    }
}