Slav - 1 month ago 17
R Question

# R unlist and multiply (date intervals)

trying to calculate number of cases between 2 dates, there is table that contains a number and time interval, i want to create output table with dates and sum of cases. The easy (and resolved) problem is:

`````` df <- data.frame(person = c("A", "B", "C"), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )
df

f1 = function() {  #keeping dates
as.data.frame(table(unlist(apply(df[-1], 1,
function(x) as.character(seq(as.Date(x[1], "%Y-%m-%d"),
as.Date(x[2], "%Y-%m-%d"), "1 day"))))))}
f1()
``````

and it would return

``````        Var1 Freq
1 2014-01-01    1
2 2014-01-02    1
3 2014-01-03    1
4 2014-01-04    2
5 2014-01-05    1
6 2014-01-06    1
``````

what i would need is to summarize the first column instead, with the input data like this

`````` df <- data.frame(cases = c(5, 2, 2), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )
``````

it should return

``````        Var1 cases
1 2014-01-01    5
2 2014-01-02    5
3 2014-01-03    2
4 2014-01-04    4
5 2014-01-05    2
6 2014-01-06    2
``````

maybe it would not be even case for unlisting, what can i potentially use to calculate number of cases per day? and is there any way to show 0 values if the date is valid between start and end but does not have any occurrences in the data

Here is a solution that is in-line with what you are doing with `f1`:

``````f2 <- function(df) {
df2 <- do.call(rbind, lapply(1:nrow(df), function(i) {
Var1 <- as.character(seq(as.Date(df\$start[i],format="%Y-%m-%d"),
as.Date(df\$stop[i],format="%Y-%m-%d"),"day"))
cases <- rep(df\$cases[i],length(Var1))
data.frame(Var1,cases)
}))
aggregate(df2[,-1], by=list(Var1=df2[,1]), FUN=sum)
}
``````

In `f2`:

1. Build a data frame `df2` from the sequence of dates from `df\$start` to `df\$stop` for each row in `df`. Here, `lapply` is used to loop over each row of `df` and the `cases` for each row is repeated to match the `length` of the resulting sequence of dates. Then use `rbind` to combine each of these data frames by rows.
2. Then `aggregate` (from the `stats` package) this `df2` by dates (i.e., `Var1`) and `sum` up the `cases`.

``````f2(df)