jebyrnes jebyrnes - 1 month ago 17
R Question

Simulating a timeseries in dplyr instead of using a for loop

So, while

lag
and
lead
in dplyr are great, I want to simulate a timeseries of something like population growth. My old school code would look something like:

tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
tdf$pop[i] = 1.1*tdf$pop[i-1]
}


which produces

time pop
1 1 50.000
2 2 55.000
3 3 60.500
4 4 66.550
5 5 73.205


I feel like there has to be a
dplyr
or
tidyverse
way to do this (as much as I love my for loop).

But, something like

tdf <- data.frame(time=1:5, pop=50) %>%
mutate(pop = 1.1*lag(pop))


which would have been my first guess just produces

time pop
1 1 NA
2 2 55
3 3 55
4 4 55
5 5 55


I feel like I'm missing something obvious.... what is it?

Note - this is a trivial example - my real examples use multiple parameters, many of which are time-varying (I'm simulating forecasts under different GCM scenarios), so, the tidyverse is proving to be a powerful tool in bringing my simulations together.

Answer

Reduce (or its purrr variants, if you like) is what you want for cumulative functions that don't already have a cum* version written:

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205

or with purrr,

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = accumulate(pop, ~.x * 1.1))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205