Sumer Vaid Sumer Vaid - 12 days ago 6
R Question

Subtracting a smaller data frame from a larger data-frame in R without unique row ID

I have two data frames in R: Large and Small. The smaller one is contained in the larger one. Importantly, there are no unique identifiers for each row in either data frame. How can I obtain the following:


Large - Small [large minus small]


Small data-frame (SmallDF):

ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203079 10 11 14 16 -9 -9
203079 8 12 14 17 -9 -9
203080 10 12 13 13 -9 -9


Large data-frame (BigDF):

ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203078 -9 -9 15 15 18 20
203078 -9 -9 14 15 17 19
203079 10 11 14 16 -9 -9
203079 8 12 14 17 -9 -9
203080 10 12 13 13 -9 -9
203080 10 11 14 16 -9 -9
203081 10 12 14 16 -9 -9
203081 11 12 15 16 -9 -9
203082 11 11 13 15 -9 -9
203082 11 11 13 14 -9 -9


The small data frame corresponds to the rows 3, 4 and 5 of the larger data frame.

I have tried the following.

BigDF[ !(BigDF$ID %in% SmallDF$ID), ]


This doesn't work because there are unique identifiers in either row. The output I get is exactly the same as BigDF.

I have also tried the following.

library(dplyr)
setdiff(BigDF, SmallDF)


The output I receive is exactly the same as BigDF.

Any help would be appreciated! Thanks.

Answer
library(dplyr)
anti_join(BigDF, SmallDF)

This is equivalent to:

anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO", "CSF1PO.1", "D10S1248", "D10S1248.1", "D12S391", "D12S391.1"))

Obviously, if you had two variables which uniquely identify a row, you can specify just these variables in the vector passed to by:

anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO.1"))