Zheyuan Li - 1 year ago 58
R Question

# How to perform pairwise operation like `%in%` and set operations for a list of vectors

This question is motived by How can I quickly see if any elements of multiple vectors are equal in R?, but not identical / duplicated.

As a small example, suppose we have a list of 4 vectors:

``````set.seed(0)
lst <- list(vec1 = sample(1:10, 2, TRUE), vec2 = sample(1:10, 3, TRUE),
vec3 = sample(1:10, 4, TRUE), vec4 = sample(1:10, 5, TRUE))
``````

How can we perform pairwise binary operations like
`%in%`
and set operations say
`intersect`
,
`union`
,
`setdiff`
?

Suppose we want pairwise
`"%in%"`
, how can we further perform
`any()`
/
`all()`
/
`which()`
within each pair?

Note: I don't want to use
`combn()`
.

We could use `outer(x, y, FUN)`. `x` and `y` need not be a "numeric" input like numerical vector / matrix; a vector input like "list" / "matrix list" is also allowed.

For example, to apply pairwise `"%in%"` operation, we use

``````z <- outer(lst, lst, FUN = Vectorize("%in%", SIMPLIFY = FALSE, USE.NAMES = FALSE))
#     vec1      vec2      vec3      vec4
#vec1 Logical,2 Logical,2 Logical,2 Logical,2
#vec2 Logical,3 Logical,3 Logical,3 Logical,3
#vec3 Logical,4 Logical,4 Logical,4 Logical,4
#vec4 Logical,5 Logical,5 Logical,5 Logical,5
``````

Since `"%in%"` itself is not vectorized, we use `Vectorized("%in%")`. We also need `SIMPLIFY = FALSE`, so that `FUN` returns a length-1 list for each pair `(x[[i]], y[[j]])`. This is important, as `outer` works like:

``````y[[4]] | FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])  FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])
y[[3]] | FUN(x[[1]], y[[3]])  FUN(x[[2]], y[[3]])  FUN(x[[1]], y[[3]])  FUN(x[[2]], y[[4]])
y[[2]] | FUN(x[[1]], y[[2]])  FUN(x[[2]], y[[2]])  FUN(x[[1]], y[[2]])  FUN(x[[2]], y[[4]])
y[[1]] | FUN(x[[1]], y[[1]])  FUN(x[[2]], y[[1]])  FUN(x[[1]], y[[1]])  FUN(x[[2]], y[[4]])
-------------------  -------------------  -------------------  -------------------
x[[1]]               x[[2]]               x[[3]]               x[[4]]
``````

It must be satisfied that `length(FUN(x, y)) == length(x) * length(y)`. While if `SIMPLIFY = FALSE`, this does not necessarily hold.

The result `z` above is a "matrix list", with `class(z)` being "matrix", but `typeof(z)` being "list". Read Why is this matrix not numeric? for more.

If we want to further apply some summary function to each element of `z`, we could use `lapply`. Here I would offer two examples.

Example 1: Apply `any()`

Since `any(a %in% b)` is as same as `any(b %in% a)`, i.e., the operation is symmetric, we only need to work with the lower triangular of `z`:

``````lz <- z[lower.tri(z)]
``````

`lapply` returns an unnamed list, but for readability we want a named list. We may use matrix index `(i, j)` as name:

``````ind <- which(lower.tri(z), arr.ind = TRUE)
NAME <- paste(ind[,1], ind[,2], sep = ":")
any_lz <- setNames(lapply(lz, any), NAME)

#List of 6
# \$ 2:1: logi FALSE
# \$ 3:1: logi TRUE
# \$ 4:1: logi TRUE
# \$ 3:2: logi TRUE
# \$ 4:2: logi FALSE
# \$ 4:3: logi TRUE
``````

Set operations like `intersect`, `union` and `setequal` are also symmetric operations which we can work with similarly.

Example 2: Apply `which()`

`which(a %in% b)` is not a symmetric operation, so we have to work with the full matrix.

``````NAME <- paste(1:nrow(z), rep(1:nrow(z), each = ncol(z)), sep = ":")
which_z <- setNames(lapply(z, which), NAME)

# List of 16
#  \$ 1:1: int [1:2] 1 2
#  \$ 2:1: int(0)
#  \$ 3:1: int [1:2] 1 2
#  \$ 4:1: int 3
#  \$ 1:2: int(0)
#  \$ 2:2: int [1:3] 1 2 3
#  ...
``````

Set operations like `setdiff` is also asymmetric and can be dealt with similarly.

Alternatives

Apart from using `outer()`, we could also use R expressions to obtain the `z` above. Again, I take binary operation `"%in%"` as an example:

``````op <- "'%in%'"    ## operator

lst_name <- names(lst)
op_call <- paste0(op, "(", lst_name, ", ", rep(lst_name, each = length(lst)), ")")
# [1] "'%in%'(vec1, vec1)" "'%in%'(vec2, vec1)" "'%in%'(vec3, vec1)"
# [4] "'%in%'(vec4, vec1)" "'%in%'(vec1, vec2)" "'%in%'(vec2, vec2)"
# ...
``````

Then we can parse and evaluate these expressions within `lst`. We may use combination index for names of the resulting list:

``````NAME <- paste(1:length(lst), rep(1:length(lst), each = length(lst)), sep = ":")
z <- setNames(lapply(parse(text = op_call), eval, lst), NAME)

# List of 16
#  \$ 1:1: logi [1:2] TRUE TRUE
#  \$ 2:1: logi [1:3] FALSE FALSE FALSE
#  \$ 3:1: logi [1:4] TRUE TRUE FALSE FALSE
#  \$ 4:1: logi [1:5] FALSE FALSE TRUE FALSE FALSE
#  \$ 1:2: logi [1:2] FALSE FALSE
#  ...
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download