Daniel Fletcher - 1 year ago 39

R Question

(I expect this has already been asked/answered. If so, sorry, I'm failing to locate the answer.)

Let's say I have 6 vectors. How can I quickly check whether any element for each vector is equal to any element of all the other vectors?

I've seen a number of functions for how to do this with *2* vectors, and I'm unsure how to do it quickly with 3+.

I know I could do the following, and it feels really cumbersome/pre-historic/error-prone:

`any(vec1 %in% vec2, vec1 %in% vec3, vec1 %in% vec4, vec1 %in% vec5, vec1 %in% vec6,`

vec2 %in% vec3, vec2 %in% vec4, vec2 %in% vec5, vec2 %in% vec6,

vec3 %in% vec4, vec3 %in% vec5, vec3 %in% vec6,

vec4 %in% vec5, vec4 %in% vec6,

vec5 %in% vec6)

Thanks.

By the way, I checked How to find common elements from multiple vectors? and that appears to be asking for how to identify elements that are present in

Answer Source

If you put your vectors in a list, they'll be substantially easier to work with:

```
# make sample data
set.seed(47)
x <- replicate(6, rpois(3, 10), simplify = FALSE)
str(x)
# List of 6
# $ : int [1:3] 16 12 10
# $ : int [1:3] 9 10 6
# $ : int [1:3] 10 14 4
# $ : int [1:3] 7 6 4
# $ : int [1:3] 12 8 7
# $ : int [1:3] 7 11 8
```

Now iterate with `lapply`

:

```
lapply(x, function(y){sapply(x, function(z){y %in% z})})
## [[1]]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE FALSE FALSE FALSE TRUE FALSE
## [3,] TRUE TRUE TRUE FALSE FALSE FALSE
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] FALSE TRUE FALSE FALSE FALSE FALSE
## [2,] TRUE TRUE TRUE FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE TRUE FALSE FALSE
## ... ... ... ... ... ... ...
```

which is a matrix for each vector, where the rows are the elements of that respective vector and the columns are each of the vectors in the list, and the values indicate whether that element is in that vector. Obviously each will match with itself, so the first column of the first element is all `TRUE`

, as is the second column of the second element, etc. Other `TRUE`

s indicate cross-vector matches. If lengths are inconsistent, it will return a nested list of the same information instead of a matrix. If you'd rather have a nested list anyway, change `sapply`

to `lapply`

.

Alternately, if you just want a vector of matches for each vector,

```
str(lapply(x, function(y){which(sapply(x, function(z){any(y %in% z)}))}))
## List of 6
## $ : int [1:4] 1 2 3 5
## $ : int [1:4] 1 2 3 4
## $ : int [1:4] 1 2 3 4
## $ : int [1:5] 2 3 4 5 6
## $ : int [1:4] 1 4 5 6
## $ : int [1:3] 4 5 6
```

where each element still contains itself as a match. Take out `which`

for Booleans instead of indices.