ThomasJohnson ThomasJohnson - 22 days ago 7
R Question

Does word exist in other column?

I would like to find if any of the words from columns 1,2 or 3 exist in column 4 - if yes return 1 if no return 0. For example we will have l5 l6 in V1, l5 in V2 and l8 l45 in V3. V4 is l6 and l56 so the new column V5 will return 1 because there is a match - second word from V1 is l6 - new column will return 1.

Another example from below table - row 6 will return 1 into new column because it contains word l7 in V4 which exists in column V3. - I would like to always compare V4 against other columns for output. Thank you.

id V1 V2 V3 V4
1 l7 l7 l7 l7
2 l31 l7 l56 l7 l35 l22
3 l31 l7 l7 l31 l7 l7
4 l22 l16 l22 l28
5 l31 l31 l32 l31
6 l18 l48 l18 l7 l22 l28,l36,l7
7 l31 l31 l31 l32
8 l31 l10 l39 l31,l7
9 l7 l35 l31 l7 l35 l31
10 l36 l36 l36 l7 l36

Answer Source

We can split the columns with strsplit, concatenate with the elements using Map and then compare with the 5th column

as.integer(mapply(function(x, y) any(x %in% y), 
               strsplit(df1[,5], '[, ]'), 
    do.call(Map, c(f = c, lapply(df1[2:4], function(x) strsplit(x, "[, ]"))))))
#[1] 1 0 1 0 1 1 0 1 1 1

Or more compactly, paste the rows (columns 2 to 4) together then do the strsplit and compare

as.integer(sapply(Map(`%in%`, strsplit(df1[,5], '[, ]+'), 
    lapply(strsplit(do.call(paste, df1[2:4]), "[ ,]+"), unique)), any))
#[1] 1 0 1 0 1 1 0 1 1 1