hans glick hans glick - 1 month ago 7
R Question

R : Know in how many vectors from a list, a value is contained

I got a list of vectors of words called list_of_sentences.

s1=sample(letters,size = 5,replace = FALSE)
s2=sample(letters,size = 7,replace = FALSE)
s3=sample(letters,size = 3,replace = FALSE)
list_of_sentences=list(s1,s2,s3)


Suppose, I want to know how many sentences contains the word "a". How would you do that, knowing that I got a list of 50,000 sentences built from 6,000 words. Basically I'm looking for a "vectorized" version of %in% function in order to run something like :

vectorized_match_fun("a",list_of_sentences)
TRUE FALSE TRUE FALSE FALSE FALSE FALSE ...

Answer

You can run %in% within an apply function.

set.seed(13)
s1=sample(letters,size = 5,replace = FALSE)
s2=sample(letters,size = 7,replace = FALSE)
s3=sample(letters,size = 3,replace = FALSE)
list_of_sentences=list(s1,s2,s3)

vapply(list_of_sentences,
       function(x, find) any(find %in% x),
       "a",
       FUN.VALUE = logical(1))

Based on your comment on another answer, I will point out that %in% accepts vectors on both sides. The answer I've provided allows you to leverage this, but still only returns a single logical indicating if any match was found. However, I'm not 100% sure that's what you want, as you haven't provided sample output for how to handle a search for multiple words.

But consider

vapply(list_of_sentences,
           function(x, find) any(find %in% x),
           find = c("a", "x"),
           FUN.VALUE = logical(1))