MFR MFR - 28 days ago 6
R Question

How to remove duplicated concatenated string in R

I have the following dataset

path value
1 b,b,a,c 3
2 c,b 2
3 a 10
4 b,c,a,b 0
5 e,f 0
6 a,f 1


df



df <- data.frame (path= c("b,b,a,c", "c,b", "a", "b,c,a,b" ,"e,f" ,"a,f"), value = c(3,2,10,0,0,1))


and I wish to remove duplicated in column path. when I use this code the format of data changes:

df$path <- sapply(strsplit(as.character(df$path), split=","),
function(x) unique(x))


and it gives me data like a dataframe

path value
1 c("b", "a", "c") 3
2 c( "c", "b ") 2
...


However, I wish to have data like that:

path value
1 b, a, c 3
2 c, b 2
3 a 10
4 b, c, a 0
5 e, f 0
6 a, f 1

Answer

replace unique(x) with paste(unique(x), collapse = ', ')

df
#      path value
# 1 b, a, c     3
# 2    c, b     2
# 3       a    10
# 4 b, c, a     0
# 5    e, f     0