Jimmy - 5 months ago 38

R Question

I have a numeric vector x of length N and would like to create a vector of the within-set sums of all of the following sets: any possible combination of the x elements with at most M elements in each combination. I put together a slow iterative approach; what I am looking for here is a way without using any loops.

Consider the approach I have been taking, in the following example with N=5 and M=4

`M <- 4`

x <- 11:15

y <- as.matrix(expand.grid(rep(list(0:1), length(x))))

result <- y[rowSums(y) <= M, ] %*% x

However, as N gets large (above 22 for me), the expand.grid output becomes too big and gives an error (replace x above with x <- 11:55 to observe this). Ideally there would be an expand.grid function that permits restrictions on the rows before constructing the full matrix, which (at least for what I want) would keep the matrix size within memory limits.

Is there a way to achieve this without causing problems for large N?

Answer

Try this:

```
c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
```

It generates the same result as with your expand.grid approach, shown below for the test data.

```
M <- 4
x <- 11:15
# expand.grid approach
y <- as.matrix(expand.grid(rep(list(0:1), length(x))))
result <- y[rowSums(y) <= M, ] %*% x
# combn approach
result1 <- c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
all(sort(result[,1]) == sort(result1))
# [1] TRUE
```

This should be fast (it takes 0.227577 secs on my machine, with N=22, M=4):

```
x <- 1:22 # N = 22
M <- 4
c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
# [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 3 4 5 6 7
```

you may want to choose the unique values of the sums with

```
unique(c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k))))))
```