Agustin Indaco Agustin Indaco - 10 months ago 76
R Question

R - How to search for a string in one column in other columns of a data frame (ignoring spaces)

This is very similar to this question, but with an added layer. I am looking to see if a string in one column exists in another column. But since for some rows the column is empty, when I run the code below I get a lot of 'TRUE' because they just match spaces. How can I ignore spaces and just match on characters?

word <- c('Hello','','nyc', '')
keywords <- c('hello goodbye nyc','hello goodbye nyc', 'hello goodbye nyc', 'hello goodbye nyc')
df <- data.frame(word, keywords, stringsAsFactors=F)

What I want is to add a new column (word_exists) that tells me if strings in column 'word' exists among 'keywords'. I tried:

df$word_exists <- mapply(grepl, pattern=df$keywords, x=df$word)

But get all 'TRUE' and I think it is because it is recognizing empty spaces in 'keywords' and matching them to empty 'words'. Any suggestions?

Answer Source

Just use nzchar to check that your pattern has characters:

transform(df, word_exists=mapply(grepl, pattern=word, x=keywords) & nzchar(word))
#    word          keywords word_exists
# 1 Hello hello goodbye nyc       FALSE
# 2       hello goodbye nyc       FALSE
# 3   nyc hello goodbye nyc        TRUE
# 4       hello goodbye nyc       FALSE