DeltaIV DeltaIV - 8 months ago 39
R Question

computing and formatting averages and squares of time intervals

I have a model which predicts the duration of certain events, and measures of durations for those events. I then want to compute the difference between Predicted and Measured, the mean difference and the RMSE. I'm able to do it, but the formatting is really awkward and not what I expected:

database <- data.frame(Predicted = c(strptime(c("4:00", "3:35", "3:38"), format = "%H:%M")),
Measured = c(strptime(c("3:39", "3:40", "3:53"), format = "%H:%M")))
> Predicted Measured
1 2016-11-28 04:00:00 2016-11-28 03:39:00
2 2016-11-28 03:35:00 2016-11-28 03:40:00
3 2016-11-28 03:38:00 2016-11-28 03:53:00

This is the first weirdness: why does R shows me a time and a date, even if I clearly specified a time-only format (
), and there was no date in my data to start with? It gets weirder:

database$Error <- with(database, Predicted-Measured)
database$Mean_Error <- with(database, mean(Predicted-Measured))
database$RMSE <- with(database, sqrt(mean(as.numeric(Predicted-Measured)^2)))
> database
Predicted Measured Error Mean_Error RMSE
1 2016-11-28 04:00:00 2016-11-28 03:39:00 21 mins 0.3333333 15.17674
2 2016-11-28 03:35:00 2016-11-28 03:40:00 -5 mins 0.3333333 15.17674
3 2016-11-28 03:38:00 2016-11-28 03:53:00 -15 mins 0.3333333 15.17674

Why is the variable
expressed in minutes? For
it's not a bad choice, but it becomes quite hard to read for
. For
it's even worse, but this could be due to the
function: if I remove it, R complains that
'^' not defined for "difftime" objects
. My questions are:

  1. Is it possible to show the first 2 columns (
    ) shown in the

  2. for the other 3 columns (
    ) I would like to compare a
    format and a format in only seconds, and choose among the two. Is it possible?

EDIT: just to be more clear, my goal is to insert observations of time intervals into a dataframe and compute a vector of time interval differences. Then, compute some statistics for that vector: mean, RMSE, etc.. I know I could just enter the time observations in seconds, but that doesn't look very good: it's difficult to tell that 13200 seconds are 3 hours and 40 minutes. Thus I would like to be able to store the time intervals in the
, but then be able to manipulate them algebraically and show the results in a format of my choosing. Is that possible?

Answer Source

We can use difftime to specify the units for the difference in time. The output of difftime is an object of class difftime. When this difftime object is coerced to numeric using as.numeric, we can change these units (see the examples in ?difftime):

## Note we don't convert to date-time because we just want %H:%M
database <- data.frame(Predicted = c("4:00", "3:35", "3:38"),
                       Measured = c("3:39", "3:40", "3:53"))
## We now convert to date-time and use difftime to compute difference in minutes
database$Error <- with(database, difftime(strptime(Predicted,format="%H:%M"),strptime(Measured,format="%H:%M"), units="mins"))
## Use as.numeric to change units to seconds
database$Mean_Error <- with(database, mean(as.numeric(Error,units="secs")))
database$RMSE <- with(database, sqrt(mean(as.numeric(Error,units="secs")^2)))
##  Predicted Measured    Error Mean_Error     RMSE
##1      4:00     3:39  21 mins         20 910.6042
##2      3:35     3:40  -5 mins         20 910.6042
##3      3:38     3:53 -15 mins         20 910.6042