Slav Slav - 1 month ago 17
R Question

R unlist and multiply (date intervals)

trying to calculate number of cases between 2 dates, there is table that contains a number and time interval, i want to create output table with dates and sum of cases. The easy (and resolved) problem is:

df <- data.frame(person = c("A", "B", "C"), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )
df

f1 = function() { #keeping dates
as.data.frame(table(unlist(apply(df[-1], 1,
function(x) as.character(seq(as.Date(x[1], "%Y-%m-%d"),
as.Date(x[2], "%Y-%m-%d"), "1 day"))))))}
f1()


and it would return

Var1 Freq
1 2014-01-01 1
2 2014-01-02 1
3 2014-01-03 1
4 2014-01-04 2
5 2014-01-05 1
6 2014-01-06 1


what i would need is to summarize the first column instead, with the input data like this

df <- data.frame(cases = c(5, 2, 2), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )


it should return

Var1 cases
1 2014-01-01 5
2 2014-01-02 5
3 2014-01-03 2
4 2014-01-04 4
5 2014-01-05 2
6 2014-01-06 2


maybe it would not be even case for unlisting, what can i potentially use to calculate number of cases per day? and is there any way to show 0 values if the date is valid between start and end but does not have any occurrences in the data

Answer

Here is a solution that is in-line with what you are doing with f1:

f2 <- function(df) {
  df2 <- do.call(rbind, lapply(1:nrow(df), function(i) {
    Var1 <- as.character(seq(as.Date(df$start[i],format="%Y-%m-%d"),
                             as.Date(df$stop[i],format="%Y-%m-%d"),"day"))
    cases <- rep(df$cases[i],length(Var1))
    data.frame(Var1,cases)
  }))
  aggregate(df2[,-1], by=list(Var1=df2[,1]), FUN=sum)
}

In f2:

  1. Build a data frame df2 from the sequence of dates from df$start to df$stop for each row in df. Here, lapply is used to loop over each row of df and the cases for each row is repeated to match the length of the resulting sequence of dates. Then use rbind to combine each of these data frames by rows.
  2. Then aggregate (from the stats package) this df2 by dates (i.e., Var1) and sum up the cases.

With your data:

f2(df)
##        Var1 x
##1 2014-01-01 5
##2 2014-01-02 5
##3 2014-01-03 2
##4 2014-01-04 4
##5 2014-01-05 2
##6 2014-01-06 2
Comments