Petros17 Petros17 - 12 days ago 4
C Question

OpenMP - how to efficiently synchronize field update

I have the following code:

for (int i = 0; i < veryLargeArraySize; i++){
int value = A[i];
if (B[value] < MAX_VALUE) {
B[value]++;
}
}


I want to use OpenMP worksharing construct here, but my issue is the synchronization on the B array - all parallel threads can access any element of array B, which is very large (which made use of locks difficult since I'd need too many of them)

#pragma omp critical is a serious overhead here. Atomic is not possible, because of the
if
.

Does anyone have a good suggestion on how I might do this?

Answer

Here's what I've found out and done.

I've read on some forums that parallel histogram calculation is generally a bad idea, since it may be slower and less efficient than the sequential calculation.

However, I needed to do it (for the assignment), so what I did is the following:

  1. Parallel processing of the A array(the image) to determine the actual range of values (the histogram - B array) - find MIN and MAX of A[i]

    int min_value, max_value;
    #pragma omp for reduction(min:min_value), reduction(max:max_value)
    for (i = 0; i < veryLargeArraySize; i++){
         const unsigned int value = A[i];
         if(max_value < value) max_value = value;
         if(min_value > value) min_value = value;
    }
    
    int size_of_histo = max_value - min_value + 1;`
    
  2. That way, we can (potentially) reduce the actual histogram size from, e.g., 1M elements (allocated in array B) to 50K elements (allocated in sharedHisto)
  3. Allocate a shared array, such as:

    int num_threads = omp_get_num_threads();
    int* sharedHisto = (int*) calloc(num_threads * size_of_histo, sizeof(int));
    
  4. Each thread is assigned a part of the sharedHisto, and can update it without synchronization

    int my_id = omp_get_thread_num();
    #pragma omp parallel for default(shared) private(i)
    for(i = 0; i < veryLargeArraySize; i++){
        int value = A[i];
        // my_id * size_of_histo positions to the begining of this thread's
        // part of sharedHisto .
        // i - min_value positions to the actual histo value
        sharedHisto[my_id * size_of_histo + i - min_value]++;
    }
    
  5. Now, perform a reduction (as stated here: Reducing on array in OpenMp)

    #pragma omp parallel
    {
       // Every thread is in charge for part of the reduced histogram
       // shared_histo with the size: size_of_histo
       int my_id = omp_get_thread_num();
       int num_threads = omp_get_num_threads();
       int chunk = (size_of_histo + num_threads - 1) / num_threads;
       int start = my_id * chunk;
       int end = (start + chunk > histo_dim) ? histo_dim : start + chunk;
    
       #pragma omp for default(shared) private(i, j)
       for(i = start; i < end; i++){
           for(j = 0; j < num_threads; j++){
               int value = B[i + minHistoValue] + sharedHisto[j * size_of_histo + i];
    
               if(value > MAX_VALUE) B[i + min_value] = MAX_VALUE;
               else B[i + min_value] = value;
           }
        }
     }
    
Comments