user2018675 - 1 year ago 49

C Question

I was given some code to paralellize using OpenMP and, among the various function calls, I noticed this

`for`

`double U[n][n];`

double L[n][n];

double Aprime[n][n];

for(i=0; i<n; i++) {

for(j=0; j<n; j++) {

if (j <= i) {

double s;

s=0;

for(k=0; k<j; k++) {

s += L[j][k] * U[k][i];

}

U[j][i] = Aprime[j][i] - s;

} else if (j >= i) {

double s;

s=0;

for(k=0; k<i; k++) {

s += L[j][k] * U[k][i];

}

L[j][i] = (Aprime[j][i] - s) / U[i][i];

}

}

However, after trying to parallelize it and applying some semaphores here and there (with no luck), I came to the realization that the

`else if`

`if`

`L[j][i]`

`U[i][i]`

`if`

Is it possible to parallelize this code in such a manner to make the

`else if`

`if`

Answer Source

Before trying to parallelize things, try simplification first.

For example, the `if`

can be completely eliminated.

Also, the code is accessing the matrixes in a way that causes *worst* cache performance. That may be the *real* bottleneck.

I've cleaned things up in stages, so you can see the code transformations.

With this, it should be possible to add `omp`

directives successfully. As I mentioned in my top comment, the global vs. function scope of the variables affects the type of update that may be required (e.g. `omp atomic update`

, etc.)

For reference, here is your original code:

```
double U[n][n];
double L[n][n];
double Aprime[n][n];
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
if (j <= i) {
double s;
s = 0;
for (k = 0; k < j; k++) {
s += L[j][k] * U[k][i];
}
U[j][i] = Aprime[j][i] - s;
}
else if (j >= i) {
double s;
s = 0;
for (k = 0; k < i; k++) {
s += L[j][k] * U[k][i];
}
L[j][i] = (Aprime[j][i] - s) / U[i][i];
}
}
}
```

The `else if (j >= i)`

was unnecessary and could be replaced with just `else`

. But, we can split the `j`

loop into two loops so that neither needs an `if/else`

:

```
// fix2.c -- split up j's loop to eliminate if/else inside
double U[n][n];
double L[n][n];
double Aprime[n][n];
for (i = 0; i < n; i++) {
for (j = 0; j <= i; j++) {
double s = 0;
for (k = 0; k < j; k++)
s += L[j][k] * U[k][i];
U[j][i] = Aprime[j][i] - s;
}
for (; j < n; j++) {
double s = 0;
for (k = 0; k < i; k++)
s += L[j][k] * U[k][i];
L[j][i] = (Aprime[j][i] - s) / U[i][i];
}
}
```

`U[i][i]`

is invariant in the second `j`

loop, so we can presave it:

```
// fix3.c -- save off value of U[i][i]
double U[n][n];
double L[n][n];
double Aprime[n][n];
for (i = 0; i < n; i++) {
for (j = 0; j <= i; j++) {
double s = 0;
for (k = 0; k < j; k++)
s += L[j][k] * U[k][i];
U[j][i] = Aprime[j][i] - s;
}
double Uii = U[i][i];
for (; j < n; j++) {
double s = 0;
for (k = 0; k < i; k++)
s += L[j][k] * U[k][i];
L[j][i] = (Aprime[j][i] - s) / Uii;
}
}
```

The access to the matrixes is done in probably the worst way for cache performance. So, if the assignment of dimensions can be flipped, a substantial savings in memory access can be achieved:

```
// fix4.c -- transpose matrix coordinates to get _much_ better memory/cache
// performance
double U[n][n];
double L[n][n];
double Aprime[n][n];
for (i = 0; i < n; i++) {
for (j = 0; j <= i; j++) {
double s = 0;
for (k = 0; k < j; k++)
s += L[k][j] * U[i][k];
U[i][j] = Aprime[i][j] - s;
}
double Uii = U[i][i];
for (; j < n; j++) {
double s = 0;
for (k = 0; k < i; k++)
s += L[k][j] * U[i][k];
L[i][j] = (Aprime[i][j] - s) / Uii;
}
}
```

**UPDATE:**

In the Op's first k-loop its

`k<j`

and in the 2nd`k<i`

don't you have to fix that?

Yes, I've fixed it. It was too ugly a change for `fix1.c`

, so I removed that and applied the changes to `fix2-fix4`

where it was easy to do.