Rufus Shinra Rufus Shinra - 23 days ago 5
R Question

How to convert Date or Datetime field when some parts are blank; na.omit fails

I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and

na.strings=c("")
works on the
read.csv
).

Of course, because the real world is never like the tutorial, some of the times are only dates, so my
as.POSIXlt(Dataset$In,format="%m/%d/%Y %H:%M")
returns NA on the "only date no time"s.

na.omit does not remove these lines. so the questions are 2


  1. Why doesn't na.omit work, or how can I get it to work?

  2. Better, How can I convert one column into both Dates and Times (in the posix format) without 2 calls or with some sort of optional parameter in the format string? (or is this even possible?).



This is a sample of the dates and times. I can't share the real file, 1 it's huge, 2 it's PII.

Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00
3,,
4,,
4,8/16/2015 6:00,8/16/2015 20:00

Answer
DF <- read.table(text = "Id,In,Out
                 1,8/15/2015 8:00,8/15/2015 17:00
                 1,8/16/2015 8:04,8/16/2015
                 1,8/17/2015 8:50,8/17/2015 18:00
                 1,8/18/2015,8/18/2015 17:00
                 2,8/15/2015,8/15/2015 13:00
                 2,8/16/2015 8:00,8/16/2015 17:00
                 3,8/15/2015 4:00,8/15/2015 11:00
                 3,8/16/2015 9:00,8/16/2015 19:00
                 3,8/17/2015,8/17/2015 17:00", header = TRUE, sep = ",",
                 stringsAsFactors = FALSE) #set this option during import


DF$In[nchar(DF$In) < 13] <- paste(DF$In[nchar(DF$In) < 13], "0:00")
DF$Out[nchar(DF$Out) < 13] <- paste(DF$Out[nchar(DF$Out) < 13], "0:00")

DF$In <- as.POSIXct(DF$In, format = "%m/%d/%Y %H:%M", tz = "GMT")
DF$Out <- as.POSIXct(DF$Out, format = "%m/%d/%Y %H:%M", tz = "GMT")
#  Id                  In                 Out
#1  1 2015-08-15 08:00:00 2015-08-15 17:00:00
#2  1 2015-08-16 08:04:00 2015-08-16 00:00:00
#3  1 2015-08-17 08:50:00 2015-08-17 18:00:00
#4  1 2015-08-18 00:00:00 2015-08-18 17:00:00
#5  2 2015-08-15 00:00:00 2015-08-15 13:00:00
#6  2 2015-08-16 08:00:00 2015-08-16 17:00:00
#7  3 2015-08-15 04:00:00 2015-08-15 11:00:00
#8  3 2015-08-16 09:00:00 2015-08-16 19:00:00
#9  3 2015-08-17 00:00:00 2015-08-17 17:00:00

na.omit doesn't work with POSIXlt objects because it is documented to "handle vectors, matrices and data frames comprising vectors and matrices (only)." (see help("na.omit")). And in the strict sense, POSIXlt objects are not vectors:

unclass(as.POSIXlt(DF$In))
#$sec
#[1] 0 0 0 0 0 0 0 0 0
#
#$min
#[1]  0  4 50  0  0  0  0  0  0
#
#$hour
#[1] 8 8 8 0 0 8 4 9 0
#
#$mday
#[1] 15 16 17 18 15 16 15 16 17
#
#$mon
#[1] 7 7 7 7 7 7 7 7 7
#
#$year
#[1] 115 115 115 115 115 115 115 115 115
#
#$wday
#[1] 6 0 1 2 6 0 6 0 1
#
#$yday
#[1] 226 227 228 229 226 227 226 227 228
#
#$isdst
#[1] 0 0 0 0 0 0 0 0 0
#
#attr(,"tzone")
#[1] "GMT"

There is hardly any reason to prefer POSIXlt over POSIXct (which is an integer giving the number of seconds since the origin internally and thus needs less memory).