Jack Jack - 3 months ago 9
R Question

Generate dates between two dates in a dataframe

Below are the input and output. I want to transform a data frame from Input format to Output Format.

I have written a piece of code for now. it does the job. but I think it's very inefficient. are there any better packages or functions that can handle this?

mycode:

#create a output data frame to be apended later
output = data.frame(id1 = character(0), id2 = character(0), dates = character(0))

# for loop to get all possible combiation of dates

for (i in c(1:nrow(input))) {
end = as.Date('2016-07-18')
len = as.numeric(end-input$min_date[i])
output = rbind(output, as.data.frame(cbind(
pid = rep(input$id1[i],len),
cid = rep(input$id2[i],len),
dates = as.character(seq(input$min_date[i], end, by='day'))
)
)
)


}

Input:

+------+--------+------------+------------+
| id1 | id2 | min_date | max_date |
+------+--------+------------+------------+
| 3575 | 155443 | 2012-06-18 | 2016-07-18 |
| 3575 | 155450 | 2012-06-12 | 2016-07-18 |
+------+--------+------------+------------+


output:

+------+--------+------------+
| id1 | id2 | dates |
+------+--------+------------+
| 3575 | 155443 | 2012-06-18 |
| 3575 | 155443 | 2012-06-19 |
| 3575 | 155443 | 2012-06-20 |
| 3575 | 155443 | .. |
| 3575 | 155443 | … |
| 3575 | 155443 | 2016-07-18 |
| | | |
| 3575 | 155450 | 2012-06-12 |
| 3575 | 155450 | 2012-06-13 |
| 3575 | 155450 | 2012-06-14 |
| 3575 | 155450 | … |
| 3575 | 155450 | … |
| 3575 | 155450 | 2016-07-18 |
+------+--------+------------+

Answer

Assuming that the 'min_date/max_date' columns are Date class, we use Map to get the sequence of each 'min_date' with the corresponding 'max_date' in a list, replicate the sequence of rows of 'df1' with the number of rows of the list elements, create a data.frame by expanding the dataset based on 'i1' and get create 'dates' by concatenating the 'lst' elements.

lst <- Map(function(x, y) seq(x,y, by = "1 day"), df1$min_date, df1$max_date)
i1 <- rep(1:nrow(df1), lengths(lst)) 
data.frame(df1[i1,-3], dates = do.call("c", lst))

Or if we are using dplyr

library(dplyr)
df1 %>%
   rowwise() %>% 
   do(data.frame(.[1:2], date = seq(.$min_date, .$max_date, by = "1 day")))

Or using data.table, we can do this in a single line of code

library(data.table) 
setDT(df1)[,.(date = seq(min_date, max_date, by = "1 day")) ,.(id1, id2)]
Comments