DW1 DW1 - 1 month ago 9
R Question

Referring to Previous element in middle of column in R

I have a data frame which has a date column and a cumulative sum column. The cumulative sum data ends at a certain point and I want to use a formula to calculate it for the rest of the dates in the date column. What I am having trouble with is having the formula reference the previous cell in the column, starting from where the count reverts to 0 (where the historical cumulative sum ends).

Example below:

dates.1 <- c("2016-12-06","2016-12-07","2016-12-08","2016-12-09","2016-12-10","2016-12-11","2016-12-12","2016-12-13","2016-12-14")
count.1 <- c(1,3,8,10,0,0,0,0,0)
drift <- .0456


df.1 <- data.frame(cbind(dates.1,count.1))


for (i in df.1$count.1) {
if (i == 0) {
head(df.1$count.1, n = 1L)+exp(drift+(qnorm(runif(5,0,1))))
}
}


I cant get the for loop to calculate it right.

The reason n = 5 for the runif is because that is the number of future entries I want to run the formula for.

The desired output would have something along the lines of

print(df.1$count.1)

[1] 1 3 8 10 12 13 16 17 18


The numbers after the 4th element are just random, the general idea is that the column would be overwritten, keeping the historical data and have the new calculated entries instead of the zeroes.

Any ideas?

Answer

There is no need to loop. You can get what you want by first identifying the row index at which the cumsum stopped:

last.ind <- which(df.1$count.1==0)[1]-1

Then use this last.ind to restart the cumsum:

set.seed(123)  ## for reproducibility
## simulation of rest of data to cumulatively sum
rest.of.data <- exp(drift+(qnorm(runif(5,0,1))))
df.1$count.1[last.ind:length(df.1$count.1)] <- cumsum(c(df.1$count.1[last.ind],rest.of.data))
print(df.1$count.1)
##[1]  1.00000  3.00000  8.00000 10.00000 10.59757 12.92824 13.75970 17.20085 22.17527

If you do want to use a loop, then you should do the following, which gives the same result but will be slower:

for (i in seq_len(length(df.1$count.1))) {
  if (df.1$count.1[i] == 0) {
    df.1$count.1[i] <- df.1$count.1[i-1] + exp(drift+(qnorm(runif(1,0,1))))
  }
}

Notes:

  1. Loop over indices of df1$.count.1 not values.
  2. If the value at the current index i is 0, write over that value with the sum of the previous value at i-1 and the data to be cumulatively summed.

Also, you should not use cbind to create your data.frame. Doing so in this case will result in df.1$count.1 being a factor instead of numeric. The data used is:

Data:

df.1 <- structure(list(dates.1 = structure(1:9, .Label = c("2016-12-06", 
"2016-12-07", "2016-12-08", "2016-12-09", "2016-12-10", "2016-12-11", 
"2016-12-12", "2016-12-13", "2016-12-14"), class = "factor"), 
    count.1 = c(1, 3, 8, 10, 0, 0, 0, 0, 0)), .Names = c("dates.1", 
"count.1"), row.names = c(NA, -9L), class = "data.frame")
##     dates.1 count.1
##1 2016-12-06       1
##2 2016-12-07       3
##3 2016-12-08       8
##4 2016-12-09      10
##5 2016-12-10       0
##6 2016-12-11       0
##7 2016-12-12       0
##8 2016-12-13       0
##9 2016-12-14       0