mwra mwra - 2 months ago 8
R Question

R: Monthly bins based on source date ranges

Binning examples for R I've found seem to assume the source data has a single date (or date/time). I've discrete start and stop dates for user accounts ranging over years 2002-2017. I want to output counts of the number of accounts active, using monthly bins, during the overall range 2002-17.

The data is currently in dd/mm/yyyy strings,though I could easily change that format if needed; rows are sorted by ascending start date. for example

Start Stop
04/09/2006 23/01/2014
...
06/07/2008 11/03/2017
...
30/09/2010 22/04/2016


The resulting counts would be, for example:

Mar 2006 0
Jan 2007 1
Mar 2011 3
Jun 2015 2
Sep 2016 1
...etc.


The aim of generating the counts is to the plot the total active accounts over time. I'm open to getting daily counts and then aggregating by month if easier. I'm stuck at the start though: bin where the source is a date range and not a single date.

Answer Source

Convert the columns to "yearmon" class and use mapply to generate the year/months ym covered. Then count how many of each year/month occur and merge that with a data frame having all the year/months between Jan 2002 and Dec 2017 giving M_na and replace NAs with 0 giving M.

library(zoo)

DF2 <- transform(DF, Start = as.yearmon(Start), Stop = as.yearmon(Stop))

ym <- unlist(mapply(seq, DF2$Start, DF2$Stop, MoreArgs = list(by = 1/12)))
Ag <- aggregate(ym^0, list(ym = as.yearmon(ym)), sum)

M_na <- merge(Ag, data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = TRUE)
M <- transform(M_na, x = replace(x, is.na(x), 0))

plot(x ~ ym, M, type = "h", xlab = "", ylab = "Count", xaxt = "n")
axis(1, 2002:2017)

Note: We assume the following input with Date class columns:

Lines <- "
Start       Stop
04/09/2006  23/01/2014
06/07/2008  11/03/2017
30/09/2010  22/04/2016"
DF <- read.table(text = Lines, header = TRUE)
fmt <- "%d/%m/%Y"
DF <- transform(DF, Start = as.Date(Start, fmt), Stop = as.Date(Stop, fmt))