Mehdi Farhangian Mehdi Farhangian - 4 months ago 12
R Question

Subtract two strings from each other

I have the following input

#mydata

ID variable1 variable2
1 a,b,c,d c,a
2 g,f,h h
3 p,l,m,n,c c,l


I wish to subtract the strings of varible2 from variable1 and I'd like to have the following output?

#Output
ID Output
1 b,d
2 g,f
3 p,m,n


#dput

structure(list(ID = 1:3, variable1 = structure(1:3, .Label = c("a,b,c,d",
"g,f,h", "p,l,m,n,c"), class = "factor"), variable2 = structure(c(1L,
3L, 2L), .Label = c("c,a", "c,l", "h"), class = "factor")), .Names = c("ID",
"variable1", "variable2"), class = "data.frame", row.names = c(NA,
-3L))

Answer

We can use Map after splitting each of the columns by , get the setdiff, paste them together, set the names of the list output with 'ID' column, stack it to 'data.frame' and set the names to 'ID' and 'Output' for the columns.

setNames(stack(setNames(Map(function(x,y) toString(setdiff(x,y)), 
         strsplit(as.character(df1$variable1), ","), 
         strsplit(as.character(df1$variable2), ",")),
              df1$ID))[2:1], c("ID", "Output"))
 #  ID  Output
 #1  1    b, d
 #2  2    g, f
 #3  3 p, m, n

Or a compact option would be

library(splitstackshape)
cSplit(df1, 2:3, ",", "long")[, .(Output = toString(setdiff(variable1, variable2))) , ID]
#   ID  Output
#1:  1    b, d
#2:  2    g, f
#3:  3 p, m, n