janosdivenyi janosdivenyi - 3 years ago 76
R Question

Why does dplyr's mutate() change the time format?

I use

readr
to read in data which consists a date column in time format. I can read it in correctly using the
col_types
option of
readr
.



library(dplyr)
library(readr)

sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"

mydf <- read_csv(sample, col_types="Ti")
mydf
time id
1 2015-03-05 02:28:11 1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48 NA
4 2015-03-05 06:13:19 NA


This is nice. However, if I want to manipulate this column with
dplyr
, the time column loses its format.

mydf %>% mutate(time = ifelse(is.na(id), NA, time))
time id
1 1425522491 1674
2 1425388259 36749
3 NA NA
4 NA NA


Why is this happening?

I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.

mydf %>% mutate(time = as.character(time)) %>%
mutate(time = ifelse(is.na(id), NA, time))

Answer Source

It's actually ifelse() that is causing this issue, not dplyr::mutate(). An example of the problem of attribute stripping is shown in help(ifelse) -

## ifelse() strips attributes
## This is important when working with Dates and factors
x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month")
## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years
y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA)
head(y) # not what you expected ... ==> need restore the class attribute:
class(y) <- class(x)

So there you have it. It's a bit of extra work if you want to use ifelse(). Here are two possible methods that will get you to your desired result without ifelse(). The first is really simple and uses is.na<-.

## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)

## resulting in
mydf
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

If you don't wish to choose that route, and want to continue with the dplyr method, you can use replace() instead of ifelse().

mydf %>% mutate(time = replace(time, is.na(id), NA))
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

Data:

mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948, 
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L, 
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA, 
-4L))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download