Zheyuan Li - 1 year ago 43

R Question

This question is motived by How can I quickly see if any elements of multiple vectors are equal in R?, but not identical / duplicated.

As a small example, suppose we have a list of 4 vectors:

`set.seed(0)`

lst <- list(vec1 = sample(1:10, 2, TRUE), vec2 = sample(1:10, 3, TRUE),

vec3 = sample(1:10, 4, TRUE), vec4 = sample(1:10, 5, TRUE))

How can we perform pairwise binary operations like

`%in%`

`intersect`

`union`

`setdiff`

Suppose we want pairwise

`"%in%"`

`any()`

`all()`

`which()`

Note: I don't want to use

`combn()`

Answer Source

We could use `outer(x, y, FUN)`

. `x`

and `y`

need not be a "numeric" input like numerical vector / matrix; a vector input like "list" / "matrix list" is also allowed.

For example, to apply pairwise `"%in%"`

operation, we use

```
z <- outer(lst, lst, FUN = Vectorize("%in%", SIMPLIFY = FALSE, USE.NAMES = FALSE))
# vec1 vec2 vec3 vec4
#vec1 Logical,2 Logical,2 Logical,2 Logical,2
#vec2 Logical,3 Logical,3 Logical,3 Logical,3
#vec3 Logical,4 Logical,4 Logical,4 Logical,4
#vec4 Logical,5 Logical,5 Logical,5 Logical,5
```

Since `"%in%"`

itself is not vectorized, we use `Vectorized("%in%")`

. We also need `SIMPLIFY = FALSE`

, so that `FUN`

returns a length-1 list for each pair `(x[[i]], y[[j]])`

. This is important, as `outer`

works like:

```
y[[4]] | FUN(x[[1]], y[[4]]) FUN(x[[2]], y[[4]]) FUN(x[[1]], y[[4]]) FUN(x[[2]], y[[4]])
y[[3]] | FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[3]]) FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[4]])
y[[2]] | FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[2]]) FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[4]])
y[[1]] | FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[1]]) FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[4]])
------------------- ------------------- ------------------- -------------------
x[[1]] x[[2]] x[[3]] x[[4]]
```

It must be satisfied that `length(FUN(x, y)) == length(x) * length(y)`

. While if `SIMPLIFY = FALSE`

, this does not necessarily hold.

The result `z`

above is a "matrix list", with `class(z)`

being "matrix", but `typeof(z)`

being "list". Read Why is this matrix not numeric? for more.

If we want to further apply some summary function to each element of `z`

, we could use `lapply`

. Here I would offer two examples.

**Example 1: Apply any()**

Since `any(a %in% b)`

is as same as `any(b %in% a)`

, i.e., the operation is symmetric, we only need to work with the lower triangular of `z`

:

```
lz <- z[lower.tri(z)]
```

`lapply`

returns an unnamed list, but for readability we want a named list. We may use matrix index `(i, j)`

as name:

```
ind <- which(lower.tri(z), arr.ind = TRUE)
NAME <- paste(ind[,1], ind[,2], sep = ":")
any_lz <- setNames(lapply(lz, any), NAME)
#List of 6
# $ 2:1: logi FALSE
# $ 3:1: logi TRUE
# $ 4:1: logi TRUE
# $ 3:2: logi TRUE
# $ 4:2: logi FALSE
# $ 4:3: logi TRUE
```

Set operations like `intersect`

, `union`

and `setequal`

are also symmetric operations which we can work with similarly.

**Example 2: Apply which()**

`which(a %in% b)`

is not a symmetric operation, so we have to work with the full matrix.

```
NAME <- paste(1:nrow(z), rep(1:nrow(z), each = ncol(z)), sep = ":")
which_z <- setNames(lapply(z, which), NAME)
# List of 16
# $ 1:1: int [1:2] 1 2
# $ 2:1: int(0)
# $ 3:1: int [1:2] 1 2
# $ 4:1: int 3
# $ 1:2: int(0)
# $ 2:2: int [1:3] 1 2 3
# ...
```

Set operations like `setdiff`

is also asymmetric and can be dealt with similarly.

**Alternatives**

Apart from using `outer()`

, we could also use R expressions to obtain the `z`

above. Again, I take binary operation `"%in%"`

as an example:

```
op <- "'%in%'" ## operator
lst_name <- names(lst)
op_call <- paste0(op, "(", lst_name, ", ", rep(lst_name, each = length(lst)), ")")
# [1] "'%in%'(vec1, vec1)" "'%in%'(vec2, vec1)" "'%in%'(vec3, vec1)"
# [4] "'%in%'(vec4, vec1)" "'%in%'(vec1, vec2)" "'%in%'(vec2, vec2)"
# ...
```

Then we can parse and evaluate these expressions within `lst`

. We may use combination index for names of the resulting list:

```
NAME <- paste(1:length(lst), rep(1:length(lst), each = length(lst)), sep = ":")
z <- setNames(lapply(parse(text = op_call), eval, lst), NAME)
# List of 16
# $ 1:1: logi [1:2] TRUE TRUE
# $ 2:1: logi [1:3] FALSE FALSE FALSE
# $ 3:1: logi [1:4] TRUE TRUE FALSE FALSE
# $ 4:1: logi [1:5] FALSE FALSE TRUE FALSE FALSE
# $ 1:2: logi [1:2] FALSE FALSE
# ...
```