Gion Mors Gion Mors - 11 months ago 35
R Question

Subset variables matching pairs of values in R

From a given data frame (

, in the example below), I would like to subset the variables with values matching at least one pair of values stored in a list (
, in the example below).

myList <- list(c(8,15), c(2,3))

v1 <- c(1, 2, 3, 8, 15)
v2 <- c(3, 7, 8, 9, 10)
v3 <- c(2, 4, 5, 6, 7)
v4 <- c(8, 15, 6, 7, 9)

myData <- cbind(v1, v2, v3, v4)

Ideally the subset should consists only of
because in v1 occurs the pair 8,15 and the pair 2,3, and in v4 occur the pair 8,15.

I tried to use the
statement for a single pair (i.e., 8, 15), as follows:

subset <- myData[which(myData==unlist(myList[[1]][1]) & myData==unlist(myList[[1]][2]))]

Still, the output is an empty integer. Am I missing something in the
statement? Plus, how could I implement the code for more than one pair of values?

Many thanks for your help!


Answer Source

I found a solution for this problem:

myData[, unique(which(sapply(myList, function(y) apply(myData, 2, function(x)all(y %in% x))),arr.ind = T)[, 1])]
     v1 v4
[1,]  1  8
[2,]  2 15
[3,]  3  6
[4,]  8  7
[5,] 15  9

It is a bit a ugly function therefore the explanations: The apply function checks whether all items from a list item from myList could be found in a column of myData. The sapply function ensures a search to all the items from the list. The which statements checks which he could found and gives the row and the column. We are only interested in the unique rows that are found which outputs the columns. A bit complicated but look at it hopefully it helps:)