Prradep Prradep - 2 months ago 7
R Question

How to retrieve multiple matching elements?

I have to two data frames

df_a
,
df_b
.

set.seed(143)
df_a <- data.frame(colA = sample(1:10, 10, replace=T), colB = sample(LETTERS[1:20],10))
df_a
# colA colB
#1 10 I
#2 1 D
#3 8 R
#4 5 F
#5 1 N
#6 10 P
#7 7 E
#8 6 S
#9 6 T
#10 4 C

df_b <- data.frame(colA = sample(1:10, 10, replace=T))
df_b
# colA
#1 9
#2 3
#3 9
#4 9
#5 3
#6 10
#7 10
#8 7
#9 4
#10 7


I have to update the
colB
with values from
colB
in data frame
df_a
based on the matching of the
colA
in both data frames.

df_a[match(df_b$colA, df_a$colA),'colB']
# [1] <NA> <NA> <NA> <NA> <NA> I I E C E
#Levels: C D E F I N P R S T


The output doesn't give more than one matched element(not a surprise!). For eg.,
10
has two values
I
,
P
but only gives
I
.

Expected Output(something like this, maybe?):

df_a[match(df_b$colA, df_a$colA),'colB']
# [1] <NA> <NA> <NA> <NA> <NA> I,P I,P E C E
#Levels: C D E F I N P R S T


As the match() function returns only the first matched value, is there any other alternative like
which()
or
%in%
for achieving the task ?

Answer

You need to aggregate to paste same matches together, and then merge, i.e.

merge(df_b, aggregate(colB ~ colA, df_a, paste, collapse = ','), by = 'colA', all.x = TRUE)
#   colA colB
#1     3 <NA>
#2     3 <NA>
#3     4    C
#4     7    E
#5     7    E
#6     9 <NA>
#7     9 <NA>
#8     9 <NA>
#9    10  I,P
#10   10  I,P