Akshay Hazari Akshay Hazari - 1 year ago 50
R Question

Applying a function to get comma separated string using multiple columns in DataFrame and create a third column

I am trying to sort names in a row and create a comma separated string which would create another column.

This being my sample data.frame .

df=data.frame(A=c("A","K","B","D","F"),B =c("E","C","D","A","K"))

A B
1 A E
2 K C
3 B D
4 D A
5 F K


The Output I am trying to get would be like this

A B C
1 A E A , E
2 K C C , K
3 B D B , D
4 D A A , D
5 F K F , K


So far I have tried this :

lapply(df,FUN=paste(sort(df$A,df$B),collapse=" , "))
mapply(FUN= function(x,y)paste(sort(x,y),collapse=" , "),df$A,df$B)


Here I am trying to sort column values and paste them using
','
to create a unique pair name.

Any help is appreciated.

Answer Source

You can do it with mapply, but since your data are factors, you need to coerce to character to they sort properly:

df$C <- mapply(function(x, y){paste(sort(c(as.character(x), as.character(y))), 
                                    collapse = ',')}, df$A, df$B)
df
#   A B   C
# 1 A E A,E
# 2 K C C,K
# 3 B D B,D
# 4 D A A,D
# 5 F K F,K

To simplify a bit, you can just use apply to iterate over the rows:

apply(df, 1, function(x){paste(sort(x), collapse = ',')})

Since it treats df as a matrix, it converts everything to character, which happens to be what you want for the sample data.

Also see tidyr::unite for pasting two columns together, though it can't easily sort.