M.Qasim - 1 year ago 77
R Question

# Estimate gaps between multiple dates

Is there any way to find gaps between multiple timelines. For example my data looks like the following:

``````library(plyr);library(dplyr)

df <- data.frame(Language = structure(c(rep("English",7), rep("German",5), rep("French", 10)), class = "character"),
Students = c(LETTERS[1:7], LETTERS[1:5], LETTERS[1:10]),
Start = structure(c(16713,16713,16713,16744,16713,16714,16754,16729,16729,16729,16750,16769,
16724,16724,16745,16724,16759,16766,16723,16722,16736,16796), class = "Date"),
End = structure(c(16762,16720,16762,16755,16720,16764,16762,16765,16765,16749,16761,16770,16758,
16744,16758,16764,16765,16766,16726,16723,16758,16806), class = "Date"))

ddply(df, .(Language), summarise,
FirstDay = min(Start),
LastDay = max(End),
Duration = LastDay - FirstDay)

plot(gvisTimeline(data=df, rowlabel = "Class", start = "Start", end = "End", options=list(width=600, height=1000) ))
``````

I am after calculating gaps when none students are taking a class. Gaps are highlighted in red in the following chart.

This is a fairly classic problem. The solution about this is to filter rows based on if the start date is larger than previous maximum end date assuming rows are sorted by start date before hand. `lag` function and `cummax()` can be used to find out previous max end date, and since `cummax()` is not defined for Date class, we can convert it to integer, apply `cummax` and then convert it back:

``````library(dplyr)
df %>%
arrange(Start) %>% group_by(Language) %>%
mutate(End_Max = lag(as.Date(cummax(as.integer(End)), "1970-01-01"))) %>%
filter(Start > End_Max + 1) %>% select(Language, End_Max, Start)

# Source: local data frame [2 x 3]
# Groups: Language [2]

#  Language    End_Max      Start
#    <fctr>     <date>     <date>
#1   German 2015-11-26 2015-11-30
#2   French 2015-11-27 2015-12-27
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download