Frederik Vanrenterghem Frederik Vanrenterghem - 3 months ago 7
R Question

Count number of occurences in date range in R

I have a dataframe with a number of accounts, their status and the start and endtime for that status. I would like to report on the number of accounts in each of these statuses over a date range. The data looks like the

df
below, with the resulting
report
. (Actual data contains more state values. N/A values are shown with a dummy date far in the future.)

df <- data.frame(account = c(1,1,2,3),
state = c("Open","Closed","Open","Open"),
startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"),
enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6),
number.open = c(2,2,2,1,1,1)
)


I have looked at options involving
rowwise()
and
mutate
from
dplyr
and
foverlaps
from
data.table
, but haven't been able to code it up so it works.
(See Checking if Date is Between two Dates in R)

Answer

We can use sapply to do this for us:

report$NumberOpen <- 
    sapply(report$date, function(x)
    sum(as.Date(df1$startdate) < as.Date(x) &
    as.Date(df1$enddate) > as.Date(x) & 
    df1$state == 'Open'))

#  report
#         date NumberOpen
# 1 2016-04-01          2
# 2 2016-04-02          2
# 3 2016-04-03          2
# 4 2016-04-04          1
# 5 2016-04-05          1
# 6 2016-04-06          1

data

df1 <- data.frame(account = c(1,1,2,3),
                 state = c("Open","Closed","Open","Open"),
                 startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"), 
                 enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6)
)
Comments