Atticus29 Atticus29 - 10 months ago 61
R Question

Selecting rows in data.table on the basis of a substring match to any of multiple columns

I have a data.table like this one, but with many more columns:

the_dt = data.table(DetailCol1=c("Deets1","Deets2","Deets3","Deets4"), DetailCol2 = c("MoreDeets1","MoreDeets2","MoreDeets3","MoreDeets4"), DataCol1=c("ARP","AARPP","ABC","ABC"), DataCol2=c("ABC","ABC","ABC","ARPe"), DataCol3 = c("ABC", "ARP", "ABC","ABC"))

I want to retrieve DetailCol1 of only those rows that contain a match to the string 'ARP'.

This question was useful in pointing me to
, but I'm still not sure how do this for multiple columns, especially if there are dozens of columns in which I would like to search.

For instance, this is how I could search within DataCol1
the_dt[DataCol1 %like% 'ARP',DetailCol1]
, but how would I conduct the same search in DataCols 1-100?

Answer Source

We can specify the columns to compare in .SDcol, loop through it with lapply, convert it to logical using %like%, check whether there is at least one TRUE per each row using Reduce, use that to subset the elements from 'DetailCol1'.

the_dt[the_dt[, Reduce(`|`, lapply(.SD, `%like%`, "ARP")),
                       .SDcols= DataCol1:DataCol3], DetailCol1]