Dhiraj Dhiraj - 3 years ago 137
R Question

Merging two dataframes on multiple columns

I have two dataframes that I am trying to merge :

set.seed(123)
df1 <- data.frame(ID=sample(letters[1:6],10,replace=TRUE))
df2 <- data.frame(
ID1 = letters[1:2],
ID2 = letters[3:4],
ID3 = letters[5:6],
V1 = c(23.32,21.24),
V2 = c(45.32,47.21)
)


Post merging, I want my
df1
to contain the columns
V1
and
V2
along with
ID
. I have tried using
merge
,
left_join
and
inner_join
(from
dplyr
) but can't figure out how to use the
by
argument. The
ID
column from
df1
could exist in any of the three columns (
ID1
,
ID2
and
ID3
) of
df2
. How can I achieve this?

Answer Source

You have to reshape in long format first, then join:

library(dplyr)
library(tidyr)

df2 %>% 
  gather(IDnr, ID, 1:3) %>% 
  left_join(df1, ., by = 'ID')

# alternative:
df1 %>% 
  left_join(., df2 %>% gather(IDnr, ID, 1:3), by = 'ID')

The result:

   ID    V1    V2 IDnr
1   d 21.24 47.21  ID2
2   e 23.32 45.32  ID3
3   f 21.24 47.21  ID3
4   d 21.24 47.21  ID2
5   f 21.24 47.21  ID3
6   c 23.32 45.32  ID2
7   a 23.32 45.32  ID1
8   e 23.32 45.32  ID3
9   a 23.32 45.32  ID1
10  d 21.24 47.21  ID2
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download