Bax Baxov Bax Baxov - 2 months ago 11
R Question

matches patterns in vector with strings in data frame

I have a data frame that contains two types cols and vector with names.
How select some rows in data frame matches with vector strings.

name = c("p4@HPS1", "p7@HPS2", "p4@HPS3", "p7@HPS4", "p7@HPS5", "p9@HPS6", "p11@HPS7", "p10@HPS8", "p15@HPS9")
expression = c(118.84, 90.04, 106.6, 104.99, 93.2, 66.84, 90.02, 108.03, 111.83)
dataset <- as.data.frame(cbind(name, expression))
nam <- c("HPS5", "HPS6", "HPS9", "HPS2")


The function should return date frame only for the specified lines
I try
dataset[mapply(grepl,nam,dataset$name)]

but it didn't work

Answer

We can use paste with collapse on the 'nam', use it as pattern argument in grep, get the index and subset the 'dataset'

dataset[grep(paste(nam, collapse="|"), dataset$name),]

If we are using the OP's code, wrap the 'name' column inside a list or else the mapply will go through individual elements of 'name' and as the number elements are not the same in 'name' and 'nam', this will throw a warning about the longer argument not a multiple of length of shorter. The mapply will return a logical matrix from which we take the rowSums and check whether it is greater than 0 to get a logical vector for subsetting the rows.

dataset[rowSums(mapply(grepl, nam, list(dataset$name)))>0,]