Tim Tim - 3 months ago 31
R Question

dplyr::mutate:- new column = difference between two comma-delimited list columns

Example that works:

df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F"), c2=c("B,C", "D,E"))
df
# c0 c1 c2
# 1 1 A,B,C B,C
# 2 2 D,E,F D,E

# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))

# c0 c1 c2 d
# 1 1 A,B,C B,C A
# 2 2 D,E,F D,E F


I get what I expected above: d is assigned the difference between these two lists of characters (they are already sorted).

However, if I introduce more than one different character it no longer works:

df <- data.frame(c0=c(1, 2), c1=c("A,B,C", "D,E,F,G"), c2=c("B,C", "D,E"))
df
# c0 c1 c2
# 1 1 A,B,C B,C
# 2 2 D,E,F,G D,E

# Add a column d with difference between c1 and c2
df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))
Error: wrong result size (3), expected 2 or 1


What I wanted to get there is:

c0 c1 c2 d
1 1 A,B,C B,C A
2 2 D,E,F,G D,E F,G


I've tried adding a
paste()
around setdiff but that didn't help. In the end I actually want to be able to probably use
tidyr::separate
to split out the d column into new rows like:

c0 c1 c2 d
1 1 A,B,C B,C A
2 2 D,E,F,G D,E F
3 2 D,E,F,G D,E G


What am I doing wrong with the setdiff above?

Thanks

Tim

Answer

You get the error because at row 2 you have more than one element which can not fit a cell, one way is to use rowwise and wrap the result as list so that it can fit and after that use unnest from tidyr to expand the list type column:

library(dplyr)
library(tidyr)
df %>% 
      rowwise() %>% 
      mutate(d=list(setdiff(unlist(strsplit(as.character(c1), ",")), 
                            unlist(strsplit(as.character(c2), ","))))) %>% 
      unnest()

# Source: local data frame [3 x 4]

#      c0      c1     c2     d
#   <dbl>  <fctr> <fctr> <chr>
# 1     1   A,B,C    B,C     A
# 2     2 D,E,F,G    D,E     F
# 3     2 D,E,F,G    D,E     G