Phaml Phaml - 1 month ago 10
R Question

merge data by groups and by common ID (IDs duplicated outside groups)

This is not a duplicated question to How to join (merge) data frames. You can perform the left.merge inside the group but not to the whole data set. The ids are unique inside group, not acroos group. By not grouping and using a left.merge, you willl mess up the data.

I have a data with many groups (Panel data/Time seriers). Within the group, I want to merge the data by a common ID. And apply the same merge across all the groups that I have(same

merge
for all other groups).

#sample data
a<-data.frame(c(1:4,1:4),1,c('a','a','a','a','b','b','b','b'))
b<-data.frame(c(2,4,2,4),10,c('a','a','b','b'))

colnames(a)<-c('id','v','group')
colnames(b)<-c('id','v1','group')

> a
id v group
1 1 1 a
2 2 1 a
3 3 1 a
4 4 1 a
5 1 1 b
6 2 1 b
7 3 1 b
8 4 1 b
> b
id v1 group
1 2 10 a
2 4 10 a
3 2 10 b
4 4 10 b


I tried to use the dplyr
group_by (group)
and then
merge(a,b,by='id',all.x=TRUE)
, but not sure how to apply dplyr to two data sets

desired output (left merge)

id v group.x v1 group.y
1 1 a NA <NA>
2 1 a 10 a
3 1 a NA <NA>
4 1 a 10 a
1 1 b NA <NA>
2 1 b 10 b
3 1 b NA <NA>
4 1 b 10 b

Answer

You can just include group in the by argument for the join:

a %>% left_join(b, by=c("id","group"))
  id v group v1
1  1 1     a NA
2  2 1     a 10
3  3 1     a NA
4  4 1     a 10
5  1 1     b NA
6  2 1     b 10
7  3 1     b NA
8  4 1     b 10

This gives you only one "group" column, but v1 will be NA for cases where there's no matching row in b, so creating two separate "group" columns is redundant. Isn't that better, given that group (presumably) represents the same underlying division of the data in both data frames?

Comments