drjrm3 - 1 year ago 81

R Question

I'm hoping to vectorize the following loop:

`for (i in 1:n) {`

for (j in 1:m) {

temp_mat[i,j]=min(temp_mat[i,j],1);

}

}

I thought I could do

`temp_mat=min(temp_mat,1)`

Answer Source

Just use `temp_mat <- pmin(temp_mat, 1)`

. See `?pmin`

for more use of parallel minima.

Example:

```
set.seed(0); A <- matrix(sample(1:3, 25, replace = T), 5)
#> A
# [,1] [,2] [,3] [,4] [,5]
#[1,] 3 1 1 3 3
#[2,] 1 3 1 2 3
#[3,] 2 3 1 3 1
#[4,] 2 2 3 3 2
#[5,] 3 2 2 2 1
B <- pmin(A, 2)
#> B
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 1 1 2 2
#[2,] 1 2 1 2 2
#[3,] 2 2 1 2 1
#[4,] 2 2 2 2 2
#[5,] 2 2 2 2 1
```

Since you have background in computational science, I would like to provide more information.

`pmin`

is fast, but is far from high performance. Its prefix "parallel" only suggests `element-wise`

. The meaning of "vectorization" in R is not the same as "SIMD vectorization" in HPC. R is an interpreted language, so **"vectorization" in R means opting for C level loop rather than R level loop.** Therefore, `pmin`

is just coded with a trivial C loop.

**Real high performance computing should benefit from SIMD vectorization.** I believe you know SSE/AVX intrinsics. So if you write a simple C code, using `_mm_min_pd`

from `SSE2`

, you will get ~2 times speedup from `pmin`

; if you see `_mm256_min_pd`

from AVX, you will get ~4 times speedup from `pmin`

.

**Unfortunately, R itself can not do any SIMD.** I have an answer to a post at Does R leverage SIMD when doing vectorized calculations? regarding this issue. For your question, even if you link your R to a HPC BLAS, `pmin`

will not benefit from SIMD, simply because `pmin`

does not involve any BLAS operations. So a better bet is to write compiled code yourself.