ChuckP - 1 year ago 74
R Question

# Is `outer` fast enough for my double summation?

I'm trying to evaluate the following double sums in r:

I know

`outer`
is a quick way to do it. I've tried the following

``````sum(f(outer(X,X,function(x,y) (x-y)/c)))
``````

and while it seems to work, I'm not sure just how fast it is compare to some alternatives? Does it make a difference in terms of speed to do
`outer`
first then my function or vice versa? is there a better way to do it?

I would like to point out first that you can write you code as

``````sum(f(outer(x, x, "-") / c))
``````

This reduces function call overhead, as subtraction in R is already a function. Try `"-"(5, 2)`.

`outer` is fast enough for your application. The only case when it is suboptimal is when your function `f` is symmetric around 0, i.e., `f(-u) = f(u)`. In this case, optimal computation only sums over the lower triangular of combination matrix `outer(x, x, "-")`, and multiply the sum by 2 for summation over off-diagonals. Finally, diagonal results are added.

The following function does this. We generate `(i, j)` indices for the lower triangular part (excluding diagonal) of the combination matrix, then the lower triangular part of `outer(x, x, "-") / c` would be `dx <- (x[i] - x[j]) / c`. Now,

• if `f` is symmetric, the result is `2 * sum(f(dx)) + n * f(0)`, and this is faster than `outer`;
• if `f` is asymmetric, we have to do `sum(f(dx)) + sum(f(-dx)) + n * f(0)`, and this won't have any advantage over `outer`.

``````## `x` is the vector, `f` is your function of interest, `c` is a constant
## `symmetric` is a switch; only set `TRUE` when `f` is symmetric around 0
g <- function (x, f, c, symmetric = FALSE) {
n <- length(x)
j <- rep.int(1:(n-1), (n-1):1)
i <- sequence((n-1):1) + j
dx <- (x[i] - x[j]) / c
if (symmetric) 2 * sum(f(dx)) + n * f(0)
else sum(f(dx)) + sum(f(-dx)) + n * f(0)
}
``````

Consider a small example here. Let's assume `c = 2` and a vector `x <- 1:500`. We also consider a symmetric function `f1 <- cos` and an asymmetric function `f2 <- sin`. Let's do a benchmark:

``````x <- 1:500
library(microbenchmark)
``````

We first consider the symmetric case with `f1`. Remember to set `symmetric = TRUE` for `g`.

``````microbenchmark(sum(f1(outer(x,x,"-")/2)), g(x, f1, 2, TRUE))

#Unit: milliseconds
#                        expr      min       lq     mean   median       uq
# sum(f2(outer(x, x, "-")/2)) 32.79472 35.35316 46.91560 36.78152 37.63580
#           g(x, f2, 2, TRUE) 20.24940 23.34324 29.97313 24.45638 25.33352
#      max neval cld
# 133.5494   100   b
# 120.3278   100  a
``````

Here we see that `g` is faster.

Now consider the asymmetric case with `f2`.

``````microbenchmark(sum(f2(outer(x,x,"-")/2)), g(x, f2, 2))

#Unit: milliseconds
#                        expr      min       lq     mean   median       uq
# sum(f2(outer(x, x, "-")/2)) 32.84412 35.55520 44.33684 36.95336 37.89508
#                 g(x, f2, 2) 36.71572 39.11832 50.54516 40.25590 41.75060
#      max neval cld
# 134.2991   100   a
# 142.5143   100   a
``````

As expected, there is no advantage here.

Yes, we also want to check that `g` is doing the correct computation. It is sufficient to consider a small example, with `x <- 1:5`.

``````x <- 1:5

#### symmetric case ####

sum(f1(outer(x, x, "-") / 2))
# [1] 14.71313

g(x, f1, 2, TRUE)
# [1] 14.71313

#### asymmetric case ####

sum(f2(outer(x, x, "-") / 2))
# [1] 0

g(x, f2, 2)
# [1] 0
``````

So `g` is correct.