madmaxthc madmaxthc - 3 months ago 6
R Question

getting lost in Using which() and regex in R

OK, I have a little problem which I believe I can solve with

which
and
grepl
(alternatives are welcome), but I am getting lost:

my_query<- c('g1', 'g2', 'g3')
my_data<- c('string2','string4','string5','string6')


I would like to return the index in
my_query
matching in
my_data
. In the example above, only 'g2' is in
mydata
, so the result in the example would be
2
. Please help :)

Answer

It seems to me that there is no easy way to do this without a loop. For each element in my_query, we can use either of the below functions to get TRUE or FALSE:

f1 <- function (pattern, x) length(grep(pattern, x)) > 0L

f2 <- function (pattern, x) any(grepl(pattern, x))

For example,

f1(my_query[1], my_data)
# [1] FALSE
f2(my_query[1], my_data)
# [1] FALSE

Then, we use *apply loop to apply, say f2 to all elements of my_query:

which(unlist(lapply(my_query, f2, x = my_data)))
# [1] 2

Thanks, that seems to work. To be honest, I preferred to your one-line original version. I am not sure why you edited with creating another function to call afterwards with *apply. Is there any advantage as compared to which(lengths(lapply(my_query, grep, my_data)) > 0L)?

Well, I am not entirely sure. When I read ?lengths:

 One advantage of ‘lengths(x)’ is its use as a more efficient
 version of ‘sapply(x, length)’ and similar ‘*apply’ calls to
 ‘length’.

I don't know how much more efficient that lengths is compared with sapply. Anyway, if it is still a loop, then my original suggestion which(lengths(lapply(my_query, grep, my_data)) > 0L) is performing 2 loops. My edit is essentially combining two loops together, hopefully to get some boost (if not too tiny).

You can still arrange my new edit into a single line:

which(unlist(lapply(my_query, function (pattern, x) any(grepl(pattern, x)), x = my_data)))

or

which(unlist(lapply(my_query, function (pattern) any(grepl(pattern, my_data)))))
Comments