goutam - 1 year ago 124
R Question

# Difference between intersect and match in R

I am trying to understand the difference between

`match`
and
`intersect`
in R. Both return the same output in a different format. Are there any functional differences between both?

``````match(names(set1), names(set2))
#  [1] NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11

intersect(names(set1), names(set2))
# [1] "Year"     "ID"
``````

`match(a, b)` returns an integer vector of `length(a)`, with the `i`-th element giving the position `j` such that `a[i] == b[j]`. `NA` is produced by default for no_match (although you can customize it).

If you want to get the same result as `intersect(a, b)`, use either of the following:

``````b[na.omit(match(a, b))]
a[na.omit(match(b, a))]
``````

Example

``````a <- 1:5
b <- 2:6

b[na.omit(match(a, b))]
# [1] 2 3 4 5

a[na.omit(match(b, a))]
# [1] 2 3 4 5
``````

I just wanted to know if there any other differences between the both. I was able to understand the results myself.

``````intersect
#function (x, y)
#{
#    y <- as.vector(y)
#    unique(y[match(as.vector(x), y, 0L)])
#}
``````

It turns out that `intersect` is written in terms of `match`!

Haha, looks like I forgot the `unique` in the outside. Em, by setting `nomatch = 0L` we can also get rid of `na.omit`. Well, R core is more efficient than my guess.

Follow-up

We could also use

``````a[a %in% b]  ## need a `unique`, too
b[b %in% a]  ## need a `unique`, too
``````

However, have a read on `?match`. In "Details" we can see how `"%in%"` is defined:

``````"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
``````

So, yes, everything is written using `match`.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download