CptNemo - 1 year ago 48

R Question

Given an ordered vector

`vec <- c(1, 4, 6, 3, 2, 7)`

`i`

`vec`

`i`

The function should proceed as following.

- For the first element , should return
`1`

(no previous element).`NA`

- For the second element , should return
`4`

.`1`

- For the third element , should return
`6`

.`weighted.mean(x = c(1,4), w`

= c(1,2)) - For the fourth element , should return
`3`

`weighted.mean(x =`

c(1,4,6), w = c(1,2,3))

The resulting vector

`result`

`length(result) == length(vec)`

`c(NA, 1, 3, 4.5, 3.9, 3.266667)`

UPDATE:

I clearly mean without using a loop

`result <- numeric()`

for (i in 1:length(vec)) {

if (i == 1) {

result <-

c(result, NA)

} else {

previous_elements <- vec[1:(i-1)]

result <-

c(result,

weighted.mean(x = previous_elements, w = 1:length(previous_elements)))

}

}

Answer Source

Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function `seq_len()`

instead of 1:i to generate the indexes

```
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
```

and then use it in sapply

```
sapply(seq_along(vec), fun, vec)
```

This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by `sapply()`

, rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length `result = numeric(length(vec))`

and then fill it `result[[i]] = ...`

, and better still to just let `sapply()`

do the right thing for you!

The problem is that it scales quadratically -- you make a pass along `vec`

to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are `n (n - 1) / 2`

calculations. So...

Take a look at `weighted.mean`

```
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
```

and use `cumsum()`

instead of `sum()`

to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as `x`

, where the ith element is the weighted mean up to that point

```
cumweighted.mean <- function(x, w) {
## handle NA values?
x <- as.numeric(x) # to avoid integer overflow
w <- as.numeric(w)
cumsum(x * w)[w != 0] / cumsum(w)
}
```

You'd like something a little different

```
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
```

This makes a single pass through the data, so scales linearly (at least in theory).