Valentin - 1 year ago 101
R Question

# busy percentage for intervals in R

I work in a hospital. Our doctors work on-call during the night and evening hours. There may be times when no patients are coming so they can rest. At other times, many patients will be there at once.
They write down when they started and when they stopped treating a patient. With the lubridate package, I can transform these data into intervals with specific dates. The length of these intervals will vary a lot as treatment may be more or less complicated. Also when a lot is going on, a doctor may be going back and forth between patients. So a typical entry will look like this: "2016-06-11 21:45:00 UTC" "2016-06-11 22:35:00 UTC"

To see which times of day are usually very busy and which are rather slow, I would like to use these data. This should also be possible for different days of the week.
The whole thing should probably look like a bar graph showing what the average occupation would be at any time of day (for example 100% occupation between 8 and 9 pm, 40% between 1 and 2 am).
My problem is that I don't know how to do that. ggplot will not handle intervals and I have not found any package that will do this average or percentage for intervals.

I hope I was able to make clear what I need and what my problems are. I'm not an experienced programmer but happy to learn.

Thanks a lot

Valentin

Edit:

Sorry, I should have thought of that. So here is as far as I have come:

``````>Daten<-read.csv2("Dienstdatum.csv")
>Beginn<-parse_date_time(Daten\$Beginn,"dmy HM“,tz="CET“)
>Ende<-parse_date_time(Daten\$Ende,"dmy HM",tz="CET“)
##Interval with date information
>Daten\$Intervalle<-interval(Beginn,Ende)
##Intervals stripped of date
>Daten\$Beg<-as.POSIXct(strftime(Beginn, format="%H:%M:%S"), format="%H:%M:%S")
> Daten\$dur<-as.duration(Daten\$Intervall)
> Daten\$Interv<-as.interval(Daten\$dur , Daten\$Beg)
>Daten\$Wochentage<-weekdays(Beginn)
``````

This way I have time intervals pointing to the same date and I have the weekdays to sort the data by. This is where I am stuck, because I know of no way to some kind of histogram on intervals. I could just use starting dates, but that would be heavily skewed, because an interval might be between 5 mins and 2 hours.

I hope the code helps. If you need some exemplary data, just tell me.

EDIT (2):
These are the raw data
https://www.dropbox.com/s/tok32wzt9wjmjih/Dienstdatum.csv?dl=0

and the output from dput:
https://www.dropbox.com/s/wgtw68rw9n0ksct/Output%20Dput.rtf?dl=0

I am afraid the data are not as well structured as could be, but it should still work. Not sure if it is a good idea to post the output inline, so I provided the file.

So after digging around and trying different things, here is what i came up with and what works for me. The salution is somewhat convoluted and verbose bus apparently gives exact results. In trying to find the answer, I learned about the vonders of vectorisation (pardon my german accent) since producing vectorised code cut the time it took to compute the results to about 3 mins whereas before i stopped calculations after about 96 hours without having completed.

Please note that the list of documented dates (not every doctor will have completed documentation of his shift) is an excel sheet with simple dates. the list of documented times working intervals is a date and time someone started seeing a patient in one column and stopped seeing that patient in another. The next row will be similar start and stop time and date.

All the variables in the text are in german or are abbreviations of german words, but I hope my comments are sufficient to understand what's going on. Also, a lot of the code is for problems specific to my situation.

Special thanks to users PhiSeu and user3507085 who helped me with different aspects of the solution.

``````#read dates
package(lubridate)
#convert start dates to POSIX
Daten\$Beginn<-parse_date_time(Daten\$Beginn,"dmy HM",tz="CET")
#prevent overlap by adding one second
Daten\$Beginn<-Daten\$Beginn+1
#convert end dates to POSIX
Daten\$Ende<-parse_date_time(Daten\$Ende,"dmy HM",tz="CET")
#remove empty rows
Daten<-na.omit(Daten)
#create intervals in which people worked
Daten\$Intervall<-interval(Daten\$Beginn,Daten\$Ende)
#read dates on which people worked
doku<-parse_date_time(doku\$V1,"%d.%m.%Y",tz="cet")

#create a start time of 09 A.M. for shifts
doku<-data.frame(cbind(doku,doku+32400))
names(doku)<-c("Datum","Beginn")
#convert to POSIX
doku\$Datum<-as.POSIXct(doku\$Datum,origin="1970-01-01",tz="cet")
doku\$Beginn<-as.POSIXct(doku\$Beginn,origin="1970-01-01",tz="cet")

#Loop to create 15 min intervals for each documented shift spanning 24 hour against which actual working hours will be checked

begin <- as.POSIXct(doku\$Beginn)

# copy begin time for loop
begin_new <- begin

# create duration object
aufl <- duration(15, "mins")

# count times for loop
times <- 24*60/15

# create dataframe with begin time
Intervall <- data.frame(begin,stringsAsFactors = FALSE)

for (i in 1:times){

cat("test",i,"\n")

# save old time for interval calculation
begin_start <- begin_new
# add 15 Minutes to original time
begin_new <- begin_new + aufl

cat(begin_new,"\n")

# create an interval object between
new_dur <- interval(begin_start,begin_new)

# bind to original dataframe
Intervall <- cbind(Intervall,new_dur)

}

vec_names <- paste0("v",c(1:(times+1)))
colnames(Intervall) <- vec_names

#create a matrix of the number of seconds worked in each of the above 15 intervals by checking the amount of intersection between 15 intervals and documented intervals of work

test<-vector()
Tabelle<-matrix(nrow=length(doku\$Beginn),ncol=times)
Tabelle[is.na(Tabelle)]<-0
for (j in 1:length(doku\$Beginn)){
for (k in 1:times){
test<-as.duration(intersect(Daten\$Intervall,Intervall[j,k+1]))
test[is.na(test)]<-0
test<-sum(test)
Tabelle[j,k]<-test}}

#cadd start time to the above matrix
Ausw<-data.frame(cbind(Tabelle,begin))
#convert to POSIX
Ausw\$begin<-as.POSIXct(Ausw\$begin,origin="1970-01-01",tz="cet")

##analysis of data
#common to all days of the week
#create labels for 15 min intervals
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")

##analysis for weekends
#how many percent people worked on average in any of the 15 min intervals on a saturday or sunday
Wochenende<-apply(Ausw[Ausw\$wtag==c(1,7),1:times],MARGIN=2,FUN=sum)
Prozent<-Wochenende/length(Ausw\$begin[Ausw\$wtag==c(1,7)]) /as.numeric(aufl)*100

names(Prozent)<-Labels
#plot as barplot and add axis labels
b=barplot(Prozent,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung am Wochenende",sub="über 100%: Übergabezeiten",xlab="Uhrzeit",ylab="Prozent")
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
axis(2,at=seq(0,160,25),las=2)

##analysos monday to friday
Woche<-apply(Ausw[Ausw\$wtag==c(2,3,4,5,6),1:times],MARGIN=2,FUN=sum)
Prozent2<-Woche/length(Ausw\$begin[Ausw\$wtag==c(2,3,4,5,6)]) /as.numeric(aufl)*100