DW1 - 1 year ago 54

R Question

I have a data frame which has a date column and a cumulative sum column. The cumulative sum data ends at a certain point and I want to use a formula to calculate it for the rest of the dates in the date column. What I am having trouble with is having the formula reference the previous cell in the column, starting from where the count reverts to 0 (where the historical cumulative sum ends).

Example below:

`dates.1 <- c("2016-12-06","2016-12-07","2016-12-08","2016-12-09","2016-12-10","2016-12-11","2016-12-12","2016-12-13","2016-12-14")`

count.1 <- c(1,3,8,10,0,0,0,0,0)

drift <- .0456

df.1 <- data.frame(cbind(dates.1,count.1))

for (i in df.1$count.1) {

if (i == 0) {

head(df.1$count.1, n = 1L)+exp(drift+(qnorm(runif(5,0,1))))

}

}

I cant get the for loop to calculate it right.

The reason n = 5 for the runif is because that is the number of future entries I want to run the formula for.

The desired output would have something along the lines of

`print(df.1$count.1)`

[1] 1 3 8 10 12 13 16 17 18

The numbers after the 4th element are just random, the general idea is that the column would be overwritten, keeping the historical data and have the new calculated entries instead of the zeroes.

Any ideas?

Answer Source

There is no need to loop. You can get what you want by first identifying the row index at which the `cumsum`

stopped:

```
last.ind <- which(df.1$count.1==0)[1]-1
```

Then use this `last.ind`

to restart the `cumsum`

:

```
set.seed(123) ## for reproducibility
## simulation of rest of data to cumulatively sum
rest.of.data <- exp(drift+(qnorm(runif(5,0,1))))
df.1$count.1[last.ind:length(df.1$count.1)] <- cumsum(c(df.1$count.1[last.ind],rest.of.data))
print(df.1$count.1)
##[1] 1.00000 3.00000 8.00000 10.00000 10.59757 12.92824 13.75970 17.20085 22.17527
```

If you do want to use a loop, then you should do the following, which gives the same result but will be **slower**:

```
for (i in seq_len(length(df.1$count.1))) {
if (df.1$count.1[i] == 0) {
df.1$count.1[i] <- df.1$count.1[i-1] + exp(drift+(qnorm(runif(1,0,1))))
}
}
```

Notes:

- Loop over indices of
`df1$.count.1`

not values. - If the value at the current index
`i`

is`0`

, write over that value with the sum of the previous value at`i-1`

and the data to be cumulatively summed.

Also, you should not use `cbind`

to create your `data.frame`

. Doing so in this case will result in `df.1$count.1`

being a `factor`

instead of `numeric`

. The data used is:

**Data:**

```
df.1 <- structure(list(dates.1 = structure(1:9, .Label = c("2016-12-06",
"2016-12-07", "2016-12-08", "2016-12-09", "2016-12-10", "2016-12-11",
"2016-12-12", "2016-12-13", "2016-12-14"), class = "factor"),
count.1 = c(1, 3, 8, 10, 0, 0, 0, 0, 0)), .Names = c("dates.1",
"count.1"), row.names = c(NA, -9L), class = "data.frame")
## dates.1 count.1
##1 2016-12-06 1
##2 2016-12-07 3
##3 2016-12-08 8
##4 2016-12-09 10
##5 2016-12-10 0
##6 2016-12-11 0
##7 2016-12-12 0
##8 2016-12-13 0
##9 2016-12-14 0
```