xjtc55 xjtc55 - 12 days ago 7
R Question

R Efficient way to create new data frame from unique rows between two data frames

I need to create a new data frame two existing data frames where the new data frame is each row from the first data frame that is not in the second data frame. I found some code here using the merge function that allowed me to do it this way. Basically, if the resulting merge has a result then the row is in the data frame and I don't add it to my new one:

for (j in 1:nrow(my.df)) {
if(nrow(merge(my.df[j,],sample.df))==0) {
test.df <- rbind(test.df,my.df[j,])
}
}


The problem is that the for loop is very slow. Is there a more efficient way to build a data frame given the constraints I have?

my.df


A B class
1 2 x
2 3 y
3 4 z


sample.df


A B class
1 2 x


test.df
should look like

A B class
2 3 y
3 4 z

Answer

Using library(dplyr) we can use anti_join():

anti_join(my.df, sample.df)
# Joining, by = c("A", "B", "class")
#   A B class
# 1 3 4     z
# 2 2 3     y

As mentioned by @Gregor, you can also convert your data.frames into data.tables with library(data.table) to get some extra quickness

Comments