Frederick Zhang Frederick Zhang - 1 month ago 11
C Question

OpenMP: How to correctly nest both MASTER and FOR in a PARALLEL block?

I am working on a program with both OpenMP and OpenMPI.

For the process running on the initial node, I'd like to have one thread working as a scheduler (interact with other nodes) and others doing the computations.

The code structure is like:

int computation(...)
{
#pragma parallel for .....
}

int main(...)
{
...
if (mpi_rank == 0) // initial node
{
#pragma omp parallel
{
#pragma omp master
{
// task scheduling for other nodes
}
{
// WRONG: said 4 threads in total, this block will be executed for
// 3 times simultaneously, and the nested "for" in the function
// will spawn 4 threads each as well
// so ACTUALLY 3*4+1=13 threads here!
int computation(...);
}
}
}
else // other nodes
{
// get a task from node 0 scheduler by MPI
int computation(...);
}
}


What I want is that, in the initial node, the scheduler takes one thread, and only one computation function is executed at the same time, so only 4 threads are used simultaneously at most.

I also tried:

int computation(...)
{
register int thread_use = omp_get_max_threads(); // this is 4
if (rank == 0)
{
--thread_use; // if initial node, use 3
}
#pragma parallel for ..... num_threads(thread_use)
}

int main(...)
{
...
if (mpi_rank == 0) // initial node
{
#pragma omp parallel
{
#pragma omp master
{
// task scheduling for other nodes
}
#pragma omp single
{
// WRONG: nest "for" can only use 1 thread
int computation(...);
}
}
}
else // other nodes
{
// get a task from node 0 scheduler by MPI
int computation(...);
}
}


...or

//other parts are the same as above
if (mpi_rank == 0) // initial node
{
#pragma omp parallel num_threads(2)
{
#pragma omp master
{
// task scheduling for other nodes
}
{
// WRONG: nest "for" can only use 1 thread
int computation(...);
}
}
}


...but none of them worked.

How should I arrange the blocks with OpenMP to achieve my goal? Any help would be appreciated, thanks a lot.

Answer

First of all, if you want to specify nested parallelism in OpenMP, you need to set the environment variable OMP_NESTED to true.

Then, a possible implementation could look like the following:

// Parallel region. Topmost level
#pragma omp parallel sections num_threads(2)
{
    #pragma omp section
    scheduling_function();

    #pragma omp section
    compute_function();
}

Where scheduling_function() is a single threaded function, and compute_function() structure is similar to:

void compute_function() {
    // Nested parallel region. Bottommost level
    #pragma omp parallel
    {
        computation();
    }
}

More information on OpenMP nested parallelism

Comments