Nik Bernou Nik Bernou - 3 months ago 11
R Question

how to find index of match between two set of data frame

Two data frame with one being my reference

df1<- structure(list(V1 = structure(c(2L, 14L, 8L, 12L, 1L, 3L, 4L,
5L, 6L, 9L, 10L, 16L, 7L, 15L, 11L, 13L), .Label = c("A", "AbC",
"B", "C", "D", "F", "FFFS", "G6_7", "GI666", "GTJJJ", "HINDO",
"MirTn", "Mumbai", "NdFi1", "TRS100", "TTTNKK"), class = "factor"),
V2 = c(10L, 22L, 33L, 35L, 89L, 6L, 973L, 686L, 82L, 22L,
1L, 82L, 1L, 9304L, 43L, 736L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-16L))


df2<- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 2L, 6L), .Label = c("AbC",
"Bangalore", "Dehli", "F", "GI666", "Mumbai"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))


I want to find the index of those that match and a dash for those that are not match between df1$V1 and df2$V1

I tried to do it with no success, it is because the R is repeating the index over the column

df1$myindex <- as.character(which(df1$V1 %in% df2$V1))


what I am looking for is shown below

# V1 myindex
#1 AbC 1
#2 F 9
#3 GI666 10
#4 Dehli -
#5 Bangalore -
#6 Mumbai 16

Answer

You can use match

match(df2$V1, df1$V1)
#[1]  1  9 10 NA NA 16

If you do not want NA and want it as -, you can use ifelse

i1 <- match(df2$V1, df1$V1)
df2$myindex <- ifelse(is.na(i1), "-", i1)
df2
#         V1 myindex
#1       AbC       1
#2         F       9
#3     GI666      10
#4     Dehli       -
#5 Bangalore       -
#6    Mumbai      16