JackeJR JackeJR - 4 months ago 24
R Question

Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt

I have followed a number of questions here that asks about how to convert character vectors to datetime classes. I often see 2 methods, the strptime and the as.POSIXct/as.POSIXlt methods. I looked at the 2 functions but am unclear what the difference is.


function (x, format, tz = "")
y <- .Internal(strptime(as.character(x), format, tz))
names(y$year) <- names(x)
<bytecode: 0x045fcea8>
<environment: namespace:base>


function (x, tz = "", ...)
<bytecode: 0x069efeb8>
<environment: namespace:base>


function (x, tz = "", ...)
<bytecode: 0x03ac029c>
<environment: namespace:base>

Doing a microbenchmark to see if there are performance differences:

Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 5000, replace = TRUE)
df <- microbenchmark(strptime(Dates, "%d-%m-%Y"), as.POSIXlt(Dates, format = "%d-%m-%Y"), times = 1000)

Unit: milliseconds
expr min lq median uq max
1 as.POSIXlt(Dates, format = "%d-%m-%Y") 32.38596 33.81324 34.78487 35.52183 61.80171
2 strptime(Dates, "%d-%m-%Y") 31.73224 33.22964 34.20407 34.88167 52.12422

strptime seems slightly faster. so what gives? why would there be 2 similar functions or are there differences between them that I missed?


Well, the functions do different things.

First, there's two internal implementations of date/time: POSIXct, which stores seconds since UNIX epoch (+some other data), and POSIXlt, which stores a list of day, month, year, hour, minute, second, etc.

strptime is a function to directly convert character vectors (of a variety of formats) to POSIXlt format.

as.POSIXlt converts a variety of data types to POSIXlt. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime.

as.POSIXct converts a variety of data types to POSIXct. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime first, then does the conversion from POSIXlt to POSIXct.

It makes sense that strptime is faster, because strptime only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.