hachondeoro hachondeoro - 3 months ago 12
R Question

R: Merging 2 data frames with index on previous and past rows

Lets say I have a dataset like this:

origin=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03")),
L=c(1,2,3),
Type=c("H","L","H"))

Date L Type
1 2016-08-05 1 H
2 2016-08-04 2 L
3 2016-08-03 3 H

end=data.frame(Date=as.Date(c("2016-08-05","2016-08-04","2016-08-03","2016-08-02","2016-08-01")),
N=c(5,4,3,2,1))

Date N
1 2016-08-05 50
2 2016-08-04 40
3 2016-08-03 30
4 2016-08-02 20
5 2016-08-01 10


I want something like this:

Date L Type N
1 2016-08-03 3 H 30
1 2016-08-03 3 H 20
1 2016-08-04 2 L 40
1 2016-08-04 2 L 30
2 2016-08-05 1 H 50
3 2016-08-05 1 H 40


I want to maintain the original column date of "origin", and in the merge, I want to merge it with the current and previous dates values of "end", like a sort of merge with a loop. In other posts, only the common values are matched, which gives the 3 rows result:

merge(origin,end, by = "Date")

Date L Type N
1 2016-08-03 3 H 30
2 2016-08-04 2 L 40
3 2016-08-05 1 H 50


Which is very different, and doesn't merge the two data frames by the current and previous row values, so I am unable to figure out how to go about this.

Thank you

Answer

Here is a dplyr - tidyr approach. Assuming the Date is always consecutive, you can create an extra column on the end frame, join back with the origin and then reshape accordingly:

library(dplyr)
library(tidyr)

end %>% 
        arrange(Date) %>% 
        mutate(Prev = lag(N)) %>% 
        inner_join(origin, by = "Date") %>% 
        gather(day, N, -Date, -L, -Type) %>% 
        select(-day) %>% 
        arrange(Date)

#         Date L Type  N
# 1 2016-08-03 3    H 30
# 2 2016-08-03 3    H 20
# 3 2016-08-04 2    L 40
# 4 2016-08-04 2    L 30
# 5 2016-08-05 1    H 50
# 6 2016-08-05 1    H 40

A corresponding data.table solution:

setDT(origin)
setDT(end)
setorder(end, Date)
melt(end[, Prev := shift(N)][origin, on = "Date"], measure.vars = c("N", "Prev"), 
     value.name = "N")[, variable := NULL][order(Date)]

#          Date L Type  N
# 1: 2016-08-03 3    H 30
# 2: 2016-08-03 3    H 20
# 3: 2016-08-04 2    L 40
# 4: 2016-08-04 2    L 30
# 5: 2016-08-05 1    H 50
# 6: 2016-08-05 1    H 40
Comments