Neil Neil - 1 month ago 7
R Question

converting multiple date formats into one in r

I am working with messy excel file with multiple date formats

2016-10-17T12:38:41Z
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC


I want to convert all of the above in
yyyy-mm-dd
format. I am using following code for the conversion but lot of values are coming NA.

as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))


How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case.
Please help

Answer

If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:

 z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016", 
 "10-Sep-15",  "13-Oct-09", "18-Oct-2016 05:42:26 UTC")

library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"

Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...

For what it's worth I also tried the new anytime package - it only handled the first and last element.