goutam goutam - 16 days ago 4
R Question

Difference between intersect and match in R

I am trying to understand the difference between

match
and
intersect
in R. Both return the same output in a different format. Are there any functional differences between both?

match(names(set1), names(set2))
# [1] NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11

intersect(names(set1), names(set2))
# [1] "Year" "ID"

Answer

match(a, b) returns an integer vector of length(a), with the i-th element giving the position j such that a[i] == b[j]. NA is produced by default for no_match (although you can customize it).

If you want to get the same result as intersect(a, b), use either of the following:

b[na.omit(match(a, b))]
a[na.omit(match(b, a))]

Example

a <- 1:5
b <- 2:6

b[na.omit(match(a, b))]
# [1] 2 3 4 5

a[na.omit(match(b, a))]
# [1] 2 3 4 5

I just wanted to know if there any other differences between the both. I was able to understand the results myself.

Then we read source code

intersect
#function (x, y) 
#{
#    y <- as.vector(y)
#    unique(y[match(as.vector(x), y, 0L)])
#}

It turns out that intersect is written in terms of match!

Haha, looks like I forgot the unique in the outside. Em, by setting nomatch = 0L we can also get rid of na.omit. Well, R core is more efficient than my guess.


Follow-up

We could also use

a[a %in% b]  ## need a `unique`, too
b[b %in% a]  ## need a `unique`, too

However, have a read on ?match. In "Details" we can see how "%in%" is defined:

"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

So, yes, everything is written using match.