Essi Shams Essi Shams - 23 days ago 7
R Question

Histogramming the number of tasks in time intervals

I have a very large data frame which includes two columns containing start time and end time of a large number of tasks during a day.

My goal is to histogram the number of tasks occurring in intervals of 30 minutes (I may need to change the interval, but I think that would be easy).

Here is an example of my start and end times in a sample data frame:

StartTime <- c("8:30","8:25","10:15","11:30","12:15","12:30","1:00","2:35")

EndTime <- c("9:00","10:05","12:00","1:05","2:06","2:58","3:30","4:00")

TaskTimes <- data.frame(StartTime,EndTime)


I am challenged by this, because I have to take both start time and end time into account.

Is there an easy way to do this without building a temporary data frame containing the number of tasks in each time period?

Answer

Here is some code, I convert to timestamps first and then do a double loop to find all overlaps, which then increment the count.

StartTime <- c("8:30","8:25","10:15","11:30","12:15","12:30","1:00","2:35")
EndTime <- c("9:00","10:05","12:00","1:05","2:06","2:58","3:30","4:00")
TaskTimes <- data.frame(StartTime,EndTime)

TaskTimes$s <- strptime(TaskTimes$StartTime, "%H:%M")
TaskTimes$e <- strptime(TaskTimes$EndTime, "%H:%M")

s <- as.numeric(strptime('0:00', "%H:%M"))
df <- data.frame(tick = seq(s, s + 24 * 60 * 60, 30 * 60), count = 0) # increment half hour
for (i in 1:nrow(df)) {
  for (j in 1:nrow(TaskTimes)) {
    # overlap (StartA <= EndB) and (EndA >= StartB)
    if (df$tick[i] <= TaskTimes$e[j] & df$tick[i] + 30 * 60 >= TaskTimes$s[j]) {
      df$count[i] <- df$count[i] + 1
    }
  }
}

plot(df)