Sevland Sevland - 1 month ago 8
R Question

R - Converting NA to specific date within a date column

I am new to R, and I have this data frame where one of my column has dates. It represents end dates, so whenever there is no end, there is NA, which thus means "ongoing". Lets say I am considering my datalock to be 2016-11-01, then I would like the NAs to turn into "2016-11-01". The reason why I want a date and not NA is because I want to make calculation out those data, and NAs introduce a bias in my final result. I red anything I could and multiple attempt based on those reading failed. I am sure it is some basic stuff I am currently blind to.

dput is:

structure(list(traitm.start = structure(c(14039, 12410, 14495,
14378, 13238, 13859, 14732, 12762, 13853, 12675, 12784, 16465,
13958, 14155, 14123, 13860, 13055, 12809, 14822, 14816, 12476,
13081, 14183, 12475, 14560, 15026, 15006, 16514, 13993, 13963,
13257, 14173, 13013, 15435, 14463, 14999, 13480, 13915, 14536,
14904, 16865, 16436), class = "Date"), traitm.stop = structure(c(15908,
13633, 16733, 15078, NA, 14473, 15719, 12802, 14236, 12695, 16988,
NA, 14030, 15587, 15083, NA, 13584, 13634, NA, 15084, 12869,
15772, 16071, 12481, 16534, 15400, NA, 16863, 14781, 15198, 13390,
14963, 14426, 16988, 16289, 15405, NA, 14728, 15980, 15155, NA,
16841), class = "Date"), IS.rlp = c("1", "0", "0", "1", "1",
"1", "1", "1", "1", "0", "0", "1", "1", "0", "0", "1", "0", "1",
"0", "1", "1", "0", "0", "1", "1", "1", "0", "1", "0", "1", "1",
"0", "1", "0", "0", "1", "0", "1", "1", "0", "1", "0"), treat.lenght = structure(c(62,
41, 75, 23, NA, 20, 33, 1, 13, 1, 140, NA, 2, 48, 32, NA, 18,
28, NA, 9, 13, 90, 63, 0, 66, 12, NA, 12, 26, 41, 4, 26, 47,
52, 61, 14, NA, 27, 48, 8, NA, 14), class = "difftime", units = "days")), .Names = c("traitm.start",
"traitm.stop", "IS.rlp", "treat.lenght"), row.names = c(1L, 2L,
3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 13L, 14L, 15L, 16L, 17L, 18L,
20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 45L, 46L, 47L
), class = "data.frame")


And here is where I am stuck, with two problems:

using dplyr:

HMRoo2_Allo_M4 <- HMRoo2_Allo_M4%>%mutate(traitm.stop=
ifelse(is.na(HMRoo2_Allo_M4$traitm.stop) ==TRUE,
2016-11-01,HMRoo2_Allo_M4$traitm.stop))


1) how to tell R that
2016-11-01
has to be a date?
2) In the final product, all the dates are turned into their internal numeric form. From there, I am struggling to get them back to date format.

Thank for your help

Answer

Using data.table makes it look easy. First, basic setup:

install.packages("data.table") #optional, run if you don't have data.table package
library(data.table)

The operations you need:

setDT(df)   #turn into data.table
df[is.na(traitm.stop), traitm.stop := as.Date('2016-11-01')]
setDF(df)  #optional, turn back to Data.frame

Notes:

The data.table formula is DT[I,J,by] -- I is the subset or join, J is the operation to be performed, by is the grouping element. In our case, I is the is.na function on the traitm.stop. It returns a boolean of the same length as your data.table, allowing you to efficiently subset. J is the operation on traitm.stop where := is an assign operator special for data.tables. as.Date tells R that the string 2016-11-01 is a date. There is no by, since we are working on the full data set.

SetDT is an in-place (no copy) operator that turns a data.frame into a data.table. SetDF turns the data.table back into a data.frame, since most users who are not familiar with data.table will have trouble using its syntax.

Results:

   traitm.start traitm.stop IS.rlp treat.lenght
 1:   2008-06-09  2013-07-22      1      62 days
 2:   2003-12-24  2007-04-30      0      41 days
 3:   2009-09-08  2015-10-25      0      75 days
 4:   2009-05-14  2011-04-14      1      23 days
 5:   2006-03-31  2016-11-01      1      NA days
 6:   2007-12-12  2009-08-17      1      20 days
 7:   2010-05-03  2013-01-14      1      33 days
 ...

PS: For correct months between calculation:

library(mondate)
df[, treat.length := MonthsBetween(mondate(traitm.stop),  mondate(traitm.start))]
Comments