GabyLP GabyLP - 1 month ago 23
R Question

R sqldf not recognizing posixct

I need to filter data by date using sqldf package.

My table, "episodes" has a field "created_at, which class is POSIXct.

episodes<-data.frame(created_at=seq(from = as.POSIXct('2016-10-01 01:00:00',tz="GMT"), length.out = 100, by = "days") )

> class(episodes$created_at)
[1] "POSIXct" "POSIXt"


I get the 2nd date with:

fechaMin=min(episodes$created_at)

library(lubridate)
fechaSig=fechaMin+hours(24)


And then I filter the data with:

sqldf("SELECT * from episodes e
where strftime('%Y/%m/%d', e.created_at, 'unixepoch')>='$fechaSig' ")


But I get all the data. The filter doesn't work.

I also tried with no success:

sqldf("SELECT * from episodes e
where date(e.created_at, 'unixepoch', 'localtime')>='$fechaSig' ")

Answer

First note that a library is a repository that stores packages so sqldf is a package, not a library. This is often a source of confusion because of the library command.

sqldf does not itself support $ substitution. To get that you need to use fn$sqldf where fn comes from the gsubfn package which the sqldf package automatically loads. It adds such substitution to the arguments of whatever function it prefaces. See ?fn

Also note that the SQLite database has no date or datetime classes so sqldf just sends the internal representation of POSIXct to SQLite, i.e. a plain number representing the number of seconds since the Epoch with respect to GMT. Note that the internal representation of POSIXct is always the number of seconds since the Epoch with respect to GMT even if it displays the date and time relative to a different time zone. Thus episodes$created_at and fechSig are both sent to SQLite from R as seconds since the Epoch relative to GMT even though they don't both display that way in R. When R receives a field back from the database (at this point it's just a plain number) sqldf checks if the field name was originally of POSIXct class and if so it coerces the number it gets back to POSIXct. The heuristic does not handle time zones so it is stored as relative to GMT (as all POSIXct variables are) and it is displayed in the local time zone which is the default behavior of POSIXct.

In the following note that GMT is 4 hours ahead of EDT and 5 hours ahead of EST so the answer is correct.

episodes <- data.frame(created_at = 
 seq(from = as.POSIXct('2016-10-01 01:00:00',tz="GMT"), length.out = 100, by = "days") )

fechaMin <- min(episodes$created_at)

library(lubridate)
fechaSig <- fechaMin + hours(24)

library(sqldf)
out <- fn$sqldf("select * from episodes where created_at >= $fechaSig")

range(episodes$created_at)
## [1] "2016-10-01 01:00:00 GMT" "2017-01-08 01:00:00 GMT"
range(out$created_at)
## [1] "2016-10-01 21:00:00 EDT" "2017-01-07 20:00:00 EST"

If you prefer to work exclusively in GMT then ensure that the local time zone is GMT like this:

 Sys.setenv(TZ = "GMT")
 range(out$created_at)
 ## [1] "2016-10-02 01:00:00 GMT" "2017-01-08 01:00:00 GMT"