Agustin Indaco Agustin Indaco - 9 months ago 61
R Question

R - How to search for a string in one column in other columns of a data frame (ignoring spaces)

This is very similar to this question, but with an added layer. I am looking to see if a string in one column exists in another column. But since for some rows the column is empty, when I run the code below I get a lot of 'TRUE' because they just match spaces. How can I ignore spaces and just match on characters?

word <- c('Hello','','nyc', '')
keywords <- c('hello goodbye nyc','hello goodbye nyc', 'hello goodbye nyc', 'hello goodbye nyc')
df <- data.frame(word, keywords, stringsAsFactors=F)

What I want is to add a new column (word_exists) that tells me if strings in column 'word' exists among 'keywords'. I tried:

df$word_exists <- mapply(grepl, pattern=df$keywords, x=df$word)

But get all 'TRUE' and I think it is because it is recognizing empty spaces in 'keywords' and matching them to empty 'words'. Any suggestions?


Just use nzchar to check that your pattern has characters:

transform(df, word_exists=mapply(grepl, pattern=word, x=keywords) & nzchar(word))
#    word          keywords word_exists
# 1 Hello hello goodbye nyc       FALSE
# 2       hello goodbye nyc       FALSE
# 3   nyc hello goodbye nyc        TRUE
# 4       hello goodbye nyc       FALSE