Captain_Obvious Captain_Obvious - 2 months ago 13
R Question

Why does sapply fail to use lubridate's parse_date_time while lapply does not?

Given: a simple 4x2 data frame filled with data of type character

Goal: the same data frame but with all the values replaced by the result of applying the following lubridate function call to them:

parse_date_time(df, orders = c ("ymd_hms", "mdy_hms"), tz = "ETZ")


It seems that using lapply works fine. When using sapply, the parse_date_time function returns strange large integers.

Here is the data:

df <- as.data.frame(stringsAsFactors = FALSE, matrix(c("2014-01-13 12:08:02", "2014-01-13 12:19:46", "2014-01-14 09:59:09", "2014-01-14 10:05:09", "6-18-2016 17:43:42", "6-18-2016 18:06:59", "6-27-2016 12:16:47", "6-27-2016 12:29:05"), nrow = 4, ncol = 2, byrow = TRUE))


colnames(df) <- c("starttime", "stoptime")


Here is the sapply call:

df2 <- sapply(df, FUN = function(column) {
parse_date_time(column, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ")
})


And the lapply call:

df2 <- lapply(df, FUN = function(column) {
parse_date_time(column, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ")
})


I understand that sapply returns the simplest data structure that it can and that lapply returns a list. Had sapply worked, it would have been followed by
df2 <- data.frame(df2)
so that I'd have the desired data frame as stated in the 'Goal' (note that I did do this with the successful lapply returned list).

My question is why does the parse_date_time function behave as expected in the lapply but not in the sapply? For reference, here are example outputs of the lapply and sapply call respectively:

2016-06-27 12:29:05


1467030545

Answer

The reason is that sapply have by default simplify = TRUE and when the length or dimension of the list elements are same, it simplifies to a vector or matrix. Internally, Date time classes are stored as numeric,

typeof(parse_date_time(df$starttime, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ"))
#[1] "double"

while the class is 'POSIXct`

class(parse_date_time(df$starttime, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ"))
#[1] "POSIXct" "POSIXt"  

so it coerces to that while doing the matrix conversion, while in the list it preserves the class format.

If we are interested in a data.frame, then we create a copy of 'df' and use [] to get the same structure as 'df'

df2 <- df
df2[] <-  lapply(df, FUN = function(column) {
     parse_date_time(column, orders = c("ymd_hms", "mdy_hms"), tz = "ETZ")
   })

df2
#           starttime            stoptime
#1 2014-01-13 12:08:02 2014-01-13 12:19:46
#2 2014-01-14 09:59:09 2014-01-14 10:05:09
#3 2016-06-18 17:43:42 2016-06-18 18:06:59
#4 2016-06-27 12:16:47 2016-06-27 12:29:05
Comments