I am using the new version of
 "2008-04-01 09:00:00"
1: 2008-04-01 09:00:00
Error in charToDate(x) :
character string is not in a standard unambiguous format
DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))]
Unfortunately (for efficiency)
strptime produces a POSIXlt type, which is unsupported by
data.table and always will be due its size (40 bytes per date!) and structure. Although
strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :
Looking to base functions such as
as.Date, it uses
strptime too, creating an integer offset from epoch (oddly) stored as double. The
IDate (and friends) class in
data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by
base::sort.list(method = "radix") (which is really a counting sort).
IDate doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is
"2012-12-24" I'd lean towards:
as.integer(gsub("-", "", col)) and proceed with
YYYYMMDD integer dates. Similarly times can be
HHMMDD as an integer. Two columns:
time separately can be useful if you generally want to
roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast:
by = date%/%100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to
1:12. There isn't a separator in your dates "01APR2008", so a
substring would be one way followed by a
fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as
I haven't yet got to how best do this in
fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of
YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on
fread; was nice to see.