Pierre Laurent Pierre Laurent - 8 months ago 47
R Question

Using R, how to filter a column to keep item contained in another?

I do have a dataframe like this one

columna <- c(1,2,3)
columnb <- c("a b e", "c d", "a c d")
columnc <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
alldata <- data.frame(columna,columnb,columnc)
tokeep <- c("c", "e")

And i would like to get the same
modified to only keep in
the strings found in

Ideally, i would like to have
to be

[ "e", "c", "c" ]

I first thought i could use something like

filter(alldata, alldata$columnb %in% tokeep)
alldata[which(alldata$b %in% tokeep), ]

but i can't manage to find a solution.

Can someone guide me on this ?

Answer Source

We can try using gsub to substitute the characters which we dont want with an empty string

alldata$columnb<- gsub(paste0("[^",paste0(tokeep,collapse = "|"),"]"),"", alldata$columnb)

#  columna columnb    columnc
#1       1       e 2010-11-01
#2       2       c 2008-03-25
#3       3       c 2007-03-14

The regular expression which we are creating is

paste0("[^",paste0(tokeep, collapse = "|"), "]")

#[1] "[^c|e]"

which means anything except c or e.


As per Wiktor's comment we probably need regex as

paste0("[^",paste0(tokeep,collapse = ""),"]")
#[1] "[^ce]"