Tal Galili Tal Galili - 2 months ago 11
R Question

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

I have the following 2 data.frames:

a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])


I want to find the row a1 has that a2 doesn't.

Is there a built in function for this type of operation?

(p.s: I did write a solution for it, I am simply curious if someone already made a more crafted code)

Here is my solution:

a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])

rows.in.a1.that.are.not.in.a2 <- function(a1,a2)
{
a1.vec <- apply(a1, 1, paste, collapse = "")
a2.vec <- apply(a2, 1, paste, collapse = "")
a1.without.a2.rows <- a1[!a1.vec %in% a2.vec,]
return(a1.without.a2.rows)
}
rows.in.a1.that.are.not.in.a2(a1,a2)

Answer

This doesn't answer your question directly, but it will give you the elements that are in common. This can be done with Paul Murrell's package compare:

library(compare)
a1 <- data.frame(a = 1:5, b = letters[1:5])
a2 <- data.frame(a = 1:3, b = letters[1:3])
comparison <- compare(a1,a2,allowAll=TRUE)
comparison$tM
#  a b
#1 1 a
#2 2 b
#3 3 c

The function compare gives you a lot of flexibility in terms of what kind of comparisons are allowed (e.g. changing order of elements of each vector, changing order and names of variables, shortening variables, changing case of strings). From this, you should be able to figure out what was missing from one or the other. For example (this is not very elegant):

difference <-
   data.frame(lapply(1:ncol(a1),function(i)setdiff(a1[,i],comparison$tM[,i])))
colnames(difference) <- colnames(a1)
difference
#  a b
#1 4 d
#2 5 e
Comments