Richard Richard - 1 month ago 15
R Question

How can I find the indices of the intersetion of data frames in R efficiently?

I have the following setting (a toy example if my real problem):

data1 = data.frame(cbind(1:8,1:8+3,1:8+5))
data2 = data.frame(rbind(c(4,7,9),c(7,10,12)))


thus

> data1
X1 X2 X3
1 1 4 6
2 2 5 7
3 3 6 8
4 4 7 9
5 5 8 10
6 6 9 11
7 7 10 12
8 8 11 13


and

> data2
X1 X2 X3
1 4 7 9
2 7 10 12


How can I find the indices if the rows of
data2
in
data1
efficiently? The result in the above example should be
c(4,7)
.
I tried looping but this is just too inefficient. Thanks for any help!

Answer

We can use which with %in%

which(do.call(paste, data1) %in% do.call(paste, data2))
#[1] 4 7

Or do a join

library(data.table)
setDT(data1, keep.rownames = TRUE)[data2, on = names(data2)]$rn
#[1] "4" "7"