ebo ebo - 8 months ago 45
R Question

R dataframe subset: look for value in list

I have a dataframe

made of strings (
) and lists of strings. Let's take following example :

Name Nationality
'Alice' "USA"
'Bob' "MEX"
'Eve' c("USA", "MEX")

That is:

> dput(df)
structure(list(Name = c("Alice", "Bob", "Eve"), Nationality = list( "USA", "MEX", c("USA", "MEX"))), .Names = c("Name", "Nationality"), row.names = c(1L, 2L, 3L), class = "data.frame")

How to extract all rows that have at least "MEX" as nationality?

Expected output:

Name Nationality
'Bob' "MEX"
'Eve' c("USA", "MEX")

Edit: I've tried:

  • df[df$Nationality == "MEX", ]
    , but it only returns Bob.

  • df[df$Nationality %in% "MEX",]
    but only Bob is returned . (idem for
    ... %in% c("MEX"),]

  • df["MEX" %in% df$Nationality,]
    returns all values, just like
    df[is.element("MEX", df$Nationality),]

df[grep("MEX", df$Nationality), ]
is working...

Answer Source

The 'Nationality' column is a list of length 3. So, we can loop over the columns to check if there is any "MEX" %in% the list elements to get a logical vector and subset the rows based on that

df[sapply(lapply(df$Nationality, `%in%`, "MEX"), any),]
#    Name Nationality
#2  Bob         MEX
#3  Eve    USA, MEX

It can also be simplified as

df[sapply(df$Nationality, function(x) "MEX" %in% x),]