algoProg algoProg - 21 days ago 7
C++ Question

OpenMP: pragma cancel for ON NUMA

---------------------EDIT-------------------------

I have edited the code as follows:

#pragma omp parallel for private(i, piold, err) shared(threshold_err) reduction(+:pi) schedule (static)
{
for (i = 0; i < 10000000000; i++){ //1000000000//705035067
piold = pi;
pi += (((i&1) == false) ? 1.0 : -1.0)/(2*i+1);
err = fabs(pi-piold);
if ( err < threshold_err){
#pragma omp cancel for
}

}
}
pi = 4*pi;


I compile it with LLVM3.9/Clang4.0. When I run it with one thread I get expected results with pragma cancel action (checked against non pragma cancel version, resulted in faster run).

But when I run it with threads >=2, the program goes into loop. I am run the code on NUMA machines. What is happening? Perhaps the cancel condition is not being satisfied! But then code takes longer than single thread non-pragma-cancel version!! FYI, it runs file when OMP_CANCELLATION=false.




I have following OpenMP code. I am using LLVM-3.9/Clang-4.0 to compile this code.

#pragma omp parallel private(i, piold, err) shared(pi, threshold_err)
{
#pragma omp for reduction(+:pi) schedule (static)
for (i = 0; i < 10000000 ; i++){
piold = pi;
pi += (((i&1) == false) ? 1.0 : -1.0)/(2*i+1);
#pragma omp critical
{
err = fabs(pi-piold);// printf("Err: %0.11f\n", err);
}
if ( err < threshold_err){
printf("Cancelling!\n");
#pragma omp cancel for
}

}
}


Unfortunately I do not think the
#pragma omp cancel for
is terminating the whole
for
loop. I am printing out the
err
value in the end, but again with parallelism it is confusing which value is being printed. The final value of
err
is smaller than
threshold_err
. The print cancelling is printing but in the very beginning of the program, which is surprising. The program keeps running after that!

How to make sure that this is correct implementation? BTW OMP_CANCELLATION is set to true and a small test program returns '1' for the corresponding function, omp_get_cancellation().

Answer

I understand that the omp cancel is just a break signal, it notify so that no thread is created later. Threads which are still running will continue until the end. See http://bisqwit.iki.fi/story/howto/openmp/ and http://jakascorner.com/blog/2016/08/omp-cancel.html

In fact, in my opinion, I see your program product acceptable approximation. However, some variable can be keep in smaller scope. This is my suggestion

#include <iostream>
#include <cmath>
#include <iomanip>

int main() {

    long double pi = 0.0;
    long double threshold_err = 1e-7;
    int cancelFre = 0;

#pragma omp parallel shared(pi, threshold_err, cancelFre)
    {
#pragma omp for reduction(+:pi) schedule (static)
        for (int i = 0; i < 100000000; i++){
            long double piold = pi;
            pi += (((i&1) == false) ? 1.0 : -1.0)/(2*i+1);
            long double err = std::fabs(pi-piold);
            if ( err < threshold_err){

#pragma omp cancel for
               cancelFre++;
            }

        }
    }

    std::cout << std::setprecision(10) << pi * 4 << " " << cancelFre;

    return 0;
}