DeltaIV DeltaIV - 5 days ago 6
R Question

computing and formatting averages and squares of time intervals

I have a model which predicts the duration of certain events, and measures of durations for those events. I then want to compute the difference between Predicted and Measured, the mean difference and the RMSE. I'm able to do it, but the formatting is really awkward and not what I expected:

database <- data.frame(Predicted = c(strptime(c("4:00", "3:35", "3:38"), format = "%H:%M")),
Measured = c(strptime(c("3:39", "3:40", "3:53"), format = "%H:%M")))
database
> Predicted Measured
1 2016-11-28 04:00:00 2016-11-28 03:39:00
2 2016-11-28 03:35:00 2016-11-28 03:40:00
3 2016-11-28 03:38:00 2016-11-28 03:53:00


This is the first weirdness: why does R shows me a time and a date, even if I clearly specified a time-only format (
%H:%M
), and there was no date in my data to start with? It gets weirder:

database$Error <- with(database, Predicted-Measured)
database$Mean_Error <- with(database, mean(Predicted-Measured))
database$RMSE <- with(database, sqrt(mean(as.numeric(Predicted-Measured)^2)))
> database
Predicted Measured Error Mean_Error RMSE
1 2016-11-28 04:00:00 2016-11-28 03:39:00 21 mins 0.3333333 15.17674
2 2016-11-28 03:35:00 2016-11-28 03:40:00 -5 mins 0.3333333 15.17674
3 2016-11-28 03:38:00 2016-11-28 03:53:00 -15 mins 0.3333333 15.17674


Why is the variable
Error
expressed in minutes? For
Error
it's not a bad choice, but it becomes quite hard to read for
Mean_Error
. For
RMSE
it's even worse, but this could be due to the
as.numeric
function: if I remove it, R complains that
'^' not defined for "difftime" objects
. My questions are:


  1. Is it possible to show the first 2 columns (
    Predicted
    and
    Measured
    ) shown in the
    %H:%M
    format?

  2. for the other 3 columns (
    Error
    ,
    Mean_Error
    and
    RMSE
    ) I would like to compare a
    %M:%S
    format and a format in only seconds, and choose among the two. Is it possible?



EDIT: just to be more clear, my goal is to insert observations of time intervals into a dataframe and compute a vector of time interval differences. Then, compute some statistics for that vector: mean, RMSE, etc.. I know I could just enter the time observations in seconds, but that doesn't look very good: it's difficult to tell that 13200 seconds are 3 hours and 40 minutes. Thus I would like to be able to store the time intervals in the
%H:%M
, but then be able to manipulate them algebraically and show the results in a format of my choosing. Is that possible?

Answer

We can use difftime to specify the units for the difference in time. The output of difftime is an object of class difftime. When this difftime object is coerced to numeric using as.numeric, we can change these units (see the examples in ?difftime):

## Note we don't convert to date-time because we just want %H:%M
database <- data.frame(Predicted = c("4:00", "3:35", "3:38"),
                       Measured = c("3:39", "3:40", "3:53"))
## We now convert to date-time and use difftime to compute difference in minutes
database$Error <- with(database, difftime(strptime(Predicted,format="%H:%M"),strptime(Measured,format="%H:%M"), units="mins"))
## Use as.numeric to change units to seconds
database$Mean_Error <- with(database, mean(as.numeric(Error,units="secs")))
database$RMSE <- with(database, sqrt(mean(as.numeric(Error,units="secs")^2)))
##  Predicted Measured    Error Mean_Error     RMSE
##1      4:00     3:39  21 mins         20 910.6042
##2      3:35     3:40  -5 mins         20 910.6042
##3      3:38     3:53 -15 mins         20 910.6042
Comments