boraas boraas - 7 days ago 5
C Question

OpenMp - Loop/array bounds for each thread

Is there an ICV (internal control variable) or something similar to query the upper and lower bounds of a loop in OpenMP?

The following calculation would give me the upper and lower bounds in some cases:

#pragma omp parallel for
for ( i = 0 ; i < n ; i++ ){
int this_thread = omp_get_thread_num(),
num_threads = omp_get_num_threads();
int lower_bound = (this_thread * n / num_threads);
int upper_bound = ((this_thread+1) * n / num_threads) - 1;
...
}


For
n=100
I would get the correct lower_bound of
0, 25, 50 and 75
and upperbound of
24, 49, 74 and 99
for the threads
0, 1, 2, 3
.

If I change
n
to
99
it will give me incorrect bounds.

Does the calculation of the upper and lower bounds differ for GCC and Intel or C/C++ compilers?

Answer

There is no function from the OpenMP run-time library that will give you this information. Furthermore, it will highly depend on the scheduling applied on the loop.

By default, in absence of an explicit schedule directive, the one that will be applied is compiler-dependent and unspecified by the OpenMP standard. Many compilers will use a static scheduling, but that isn't always the case, and definitely not guaranteed.

Now, just to quote the OpenMP standard about static scheduling:

When schedule(static, chunk_size) is specified, iterations are divided into chunks of size chunk_size, and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.

When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. The size of the chunks is unspecified in this case.

As you can see, even in this simple case, if no chunk size is given and the number of threads doesn't evenly divide the number of iterations, you cannot determine reliably the lower and upper bounds of each threads' iterations.

If you define properly the size of chunks however, you should be able to compute reliably the bounds of iterations for each threads.

Now if your scheduling isn't static, then there's absolutely no way of inferring which thread will get what iteration, since this will be only defined at run-time.