Peter Calhoun Peter Calhoun - 1 year ago 58
R Question

Merging times between start and stop events in R

I am trying to merge two datasets where one dataset contains a datetime variable (

) and another dataset contains start and stop times (
). I want to take all datetimes between the start and stop times. I came up with a simple solution, but it takes over 3 hours for a dataset with around a million observations. Can someone please provide a faster solution? Below is an example with my solution; it works well for small sample sizes, but reducing
greatly increases the computation time (setting
takes almost 40s) .

secBetwMeas <- 5*60
dataA <- data.frame(id=c(rep("A",length(seq(as.POSIXct("2014-01-01 01:00:00", format="%Y-%m-%d %H:%M:%S"),as.POSIXct("2014-01-02 04:00:00", format="%Y-%m-%d %H:%M:%S"), by=secBetwMeas))),
rep("B",length(seq(as.POSIXct("2014-06-01 04:00:00", format="%Y-%m-%d %H:%M:%S"),as.POSIXct("2014-06-02 05:00:00", format="%Y-%m-%d %H:%M:%S"), by=secBetwMeas)))),
dataDtTm=c(seq(as.POSIXct("2014-01-01 01:00:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-01-02 04:00:00", format="%Y-%m-%d %H:%M:%S"), by=secBetwMeas),
seq(as.POSIXct("2014-06-01 04:00:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-06-02 05:00:00", format="%Y-%m-%d %H:%M:%S"), by=secBetwMeas)))
dataB <- data.frame(id=rep(c("A","B"),each=2),
startDtTm=c(as.POSIXct("2014-01-01 01:10:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-01-02 03:05:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-06-01 04:30:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-06-01 22:10:00", format="%Y-%m-%d %H:%M:%S")),
endDtTm=c(as.POSIXct("2014-01-01 08:30:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-01-02 07:05:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-06-01 08:30:00", format="%Y-%m-%d %H:%M:%S"),
as.POSIXct("2014-06-02 04:05:00", format="%Y-%m-%d %H:%M:%S")))

### Trying to optimize this solution: ###
dataA$endDtTm <- dataA$startDtTm <- as.POSIXct(NA)
for(i in 1:NROW(dataA)){
index <- (dataA$id[i]==dataB$id & dataA$dataDtTm[i] >= dataB$startDtTm & dataA$dataDtTm[i] <= dataB$endDtTm)
stopifnot(sum(index)==0 || sum(index)==1)
dataA$startDtTm[i] <- dataB$startDtTm[index]
dataA$endDtTm[i] <- dataB$endDtTm[index]
dataA <- na.omit(dataA)
head(dataA) #This is the dataset I want to see

Answer Source

Does this work?

dataC <- merge(dataA,dataB, by="id")
dataC[dataC$dataDtTm >= dataC$startDtTm & dataC$dataDtTm <= dataC$endDtTm,]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download