ChuckP - 7 months ago 53

R Question

I'm trying to evaluate the following double sums in r:

I know

`outer`

`sum(f(outer(X,X,function(x,y) (x-y)/c)))`

and while it seems to work, I'm not sure just how fast it is compare to some alternatives? Does it make a difference in terms of speed to do

`outer`

Answer

I would like to point out first that you can write you code as

```
sum(f(outer(x, x, "-") / c))
```

This reduces function call overhead, as subtraction in R is already a function. Try `"-"(5, 2)`

.

`outer`

is fast enough for your application. The only case when it is suboptimal is when your function `f`

is symmetric around 0, i.e., `f(-u) = f(u)`

. In this case, optimal computation only sums over the lower triangular of combination matrix `outer(x, x, "-")`

, and multiply the sum by 2 for summation over off-diagonals. Finally, diagonal results are added.

The following function does this. We generate `(i, j)`

indices for the lower triangular part (excluding diagonal) of the combination matrix, then the lower triangular part of `outer(x, x, "-") / c`

would be `dx <- (x[i] - x[j]) / c`

. Now,

- if
`f`

is symmetric, the result is`2 * sum(f(dx)) + n * f(0)`

, and this is faster than`outer`

; - if
`f`

is asymmetric, we have to do`sum(f(dx)) + sum(f(-dx)) + n * f(0)`

, and this won't have any advantage over`outer`

.

```
## `x` is the vector, `f` is your function of interest, `c` is a constant
## `symmetric` is a switch; only set `TRUE` when `f` is symmetric around 0
g <- function (x, f, c, symmetric = FALSE) {
n <- length(x)
j <- rep.int(1:(n-1), (n-1):1)
i <- sequence((n-1):1) + j
dx <- (x[i] - x[j]) / c
if (symmetric) 2 * sum(f(dx)) + n * f(0)
else sum(f(dx)) + sum(f(-dx)) + n * f(0)
}
```

Consider a small example here. Let's assume `c = 2`

and a vector `x <- 1:500`

. We also consider a symmetric function `f1 <- cos`

and an asymmetric function `f2 <- sin`

. Let's do a benchmark:

```
x <- 1:500
library(microbenchmark)
```

We first consider the symmetric case with `f1`

. Remember to set `symmetric = TRUE`

for `g`

.

```
microbenchmark(sum(f1(outer(x,x,"-")/2)), g(x, f1, 2, TRUE))
#Unit: milliseconds
# expr min lq mean median uq
# sum(f2(outer(x, x, "-")/2)) 32.79472 35.35316 46.91560 36.78152 37.63580
# g(x, f2, 2, TRUE) 20.24940 23.34324 29.97313 24.45638 25.33352
# max neval cld
# 133.5494 100 b
# 120.3278 100 a
```

Here we see that `g`

is faster.

Now consider the asymmetric case with `f2`

.

```
microbenchmark(sum(f2(outer(x,x,"-")/2)), g(x, f2, 2))
#Unit: milliseconds
# expr min lq mean median uq
# sum(f2(outer(x, x, "-")/2)) 32.84412 35.55520 44.33684 36.95336 37.89508
# g(x, f2, 2) 36.71572 39.11832 50.54516 40.25590 41.75060
# max neval cld
# 134.2991 100 a
# 142.5143 100 a
```

As expected, there is no advantage here.

Yes, we also want to check that `g`

is doing the correct computation. It is sufficient to consider a small example, with `x <- 1:5`

.

```
x <- 1:5
#### symmetric case ####
sum(f1(outer(x, x, "-") / 2))
# [1] 14.71313
g(x, f1, 2, TRUE)
# [1] 14.71313
#### asymmetric case ####
sum(f2(outer(x, x, "-") / 2))
# [1] 0
g(x, f2, 2)
# [1] 0
```

So `g`

is correct.