CptNemo - 1 year ago 58
R Question

# R: Compute on previous elements of an ordered vector

Given an ordered vector

`vec <- c(1, 4, 6, 3, 2, 7)`
, I want to compute for each element
`i`
of
`vec`
the weighted average of the previous elements where the weight is the inverse of the distance from the element
`i`
.

The function should proceed as following.

• For the first element
`1`
, should return
`NA`
(no previous element).

• For the second element
`4`
, should return
`1`
.

• For the third element
`6`
, should return
```weighted.mean(x = c(1,4), w = c(1,2))```
.

• For the fourth element
`3`
, should return
```weighted.mean(x = c(1,4,6), w = c(1,2,3))```

The resulting vector
`result`
should be, with
`length(result) == length(vec)`
,
`c(NA, 1, 3, 4.5, 3.9, 3.266667)`
.

UPDATE:
I clearly mean without using a loop

``````result <- numeric()

for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
``````

Answer Source

Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function `seq_len()` instead of 1:i to generate the indexes

``````fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
``````

and then use it in sapply

``````sapply(seq_along(vec), fun, vec)
``````

This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by `sapply()`, rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length `result = numeric(length(vec))` and then fill it `result[[i]] = ...`, and better still to just let `sapply()` do the right thing for you!

The problem is that it scales quadratically -- you make a pass along `vec` to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are `n (n - 1) / 2` calculations. So...

Take a look at `weighted.mean`

``````> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
``````

and use `cumsum()` instead of `sum()` to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as `x`, where the ith element is the weighted mean up to that point

``````cumweighted.mean <- function(x, w) {
## handle NA values?
x <- as.numeric(x)  # to avoid integer overflow
w <- as.numeric(w)
cumsum(x * w)[w != 0] / cumsum(w)
}
``````

You'd like something a little different

``````myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
``````

This makes a single pass through the data, so scales linearly (at least in theory).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download