goutam - 7 months ago 57

R Question

I am trying to understand the difference between

`match`

`intersect`

`match(names(set1), names(set2))`

# [1] NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11

intersect(names(set1), names(set2))

# [1] "Year" "ID"

Answer

`match(a, b)`

returns an integer vector of `length(a)`

, with the `i`

-th element giving the position `j`

such that `a[i] == b[j]`

. `NA`

is produced by default for *no_match* (although you can customize it).

If you want to get the same result as `intersect(a, b)`

, use either of the following:

```
b[na.omit(match(a, b))]
a[na.omit(match(b, a))]
```

**Example**

```
a <- 1:5
b <- 2:6
b[na.omit(match(a, b))]
# [1] 2 3 4 5
a[na.omit(match(b, a))]
# [1] 2 3 4 5
```

I just wanted to know if there any other differences between the both. I was able to understand the results myself.

Then we read source code

```
intersect
#function (x, y)
#{
# y <- as.vector(y)
# unique(y[match(as.vector(x), y, 0L)])
#}
```

It turns out that `intersect`

is written in terms of `match`

!

Haha, looks like I forgot the `unique`

in the outside. Em, by setting `nomatch = 0L`

we can also get rid of `na.omit`

. Well, R core is more efficient than my guess.

**Follow-up**

We could also use

```
a[a %in% b] ## need a `unique`, too
b[b %in% a] ## need a `unique`, too
```

However, have a read on `?match`

. In "Details" we can see how `"%in%"`

is defined:

```
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
```

So, yes, everything is written using `match`

.

Source (Stackoverflow)