ssan ssan - 3 months ago 23
R Question

Date difference between end date to start date

I have a data alooks like below.

id from data to date
1 2015-03-09 2015-03-14
2 2015-02-22 2015-02-24
2 2015-05-06 2015-05-17
3 2015-02-12 2015-02-16
4 2015-03-10 2015-03-16
4 2015-03-22 2015-04-07
4 2015-06-07 2015-07-07
4 2015-07-06 2015-07-07
4 2015-08-02 2015-08-07


I want to create a seperate variable which is the difference between the to date and the next from date grouped by id.
So the first time of the id will be NA.I tried the below method based on the other answer in stackoverflow and I could not
achieve that.

library(data.table)
chf1 = data.table(id = chf$id,from date = chf$f.date,to_date = chf$t.date)
setkey(chf1,id)
chf1[,diff:=c(NA,difftime(from_date, to_date, units = "days")),by=id]


The output look like

id from_date to_date difference
1 2015-03-09 2015-03-14 NA
2 2015-02-22 2015-02-24 NA
2 2015-05-06 2015-05-17 71
3 2015-02-12 2015-02-16 NA
4 2015-03-10 2015-03-16 NA
4 2015-03-22 2015-04-07 6
4 2015-06-07 2015-06-10 64
4 2015-07-06 2015-07-07 26
4 2015-08-02 2015-08-07 26

Answer

There are three issues in the code

1) chf1$from_date, chf1$to_date gets the whole column, so there is no effect of grouping by 'id'

2) difftime gives output with the same length as the initial column length.

3) As difftime takes the difference between each element of 'from_date' with corresponding element of 'to_date', there is no need for by = id

Therefore, the code can be

chf1[, diff1:=difftime(from_date, to_date, units = "days")]
chf1
#   id  from_date    to_date    diff1
#1:  1 2015-03-09 2015-03-14  -5 days
##2:  2 2015-02-22 2015-02-24  -2 days
#3:  2 2015-05-06 2015-05-17 -11 days
#4:  3 2015-02-12 2015-02-16  -4 days
#5:  4 2015-03-10 2015-03-16  -6 days
#6:  4 2015-03-22 2015-04-07 -16 days
#7:  4 2015-06-07 2015-07-07 -30 days
#8:  4 2015-07-06 2015-07-07  -1 days
#9:  4 2015-08-02 2015-08-07  -5 days

Based on the description in OP's code, if we need to get the difference between the next value of 'from_date', after grouping by 'id', use the difftime on the shifted 'from_date' with that of 'to_date' and assign (:= it to 'diff1'.

chf1[,  diff1 := difftime(shift(from_date, type = "lead"), to_date, 
                        units = "days") , by = id]
chf1
#  id  from_date    to_date   diff1
#1:  1 2015-03-09 2015-03-14 NA days
#2:  2 2015-02-22 2015-02-24 71 days
#3:  2 2015-05-06 2015-05-17 NA days
#4:  3 2015-02-12 2015-02-16 NA days
#5:  4 2015-03-10 2015-03-16  6 days
#6:  4 2015-03-22 2015-04-07 61 days
#7:  4 2015-06-07 2015-07-07 -1 days
#8:  4 2015-07-06 2015-07-07 26 days
#9:  4 2015-08-02 2015-08-07 NA days

Or it could be

chf1[, diff1 := difftime(from_date, shift(to_date), units = "days"), by = id]

data

chf <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L), 
f.date = structure(c(16503, 
16488, 16561, 16478, 16504, 16516, 16593, 16622, 16649), class = "Date"), 
t.date = structure(c(16508, 16490, 16572, 16482, 16510, 16532, 
16623, 16623, 16654), class = "Date")), .Names = c("id", 
 "f.date", "t.date"), row.names = c(NA, -9L), class = "data.frame")

 chf1 = data.table(id = chf$id,from_date = chf$f.date,to_date = chf$t.date)
Comments