Akshay Hazari Akshay Hazari - 3 months ago 9
R Question

Applying a function to get comma separated string using multiple columns in DataFrame and create a third column

I am trying to sort names in a row and create a comma separated string which would create another column.

This being my sample data.frame .

df=data.frame(A=c("A","K","B","D","F"),B =c("E","C","D","A","K"))

A B
1 A E
2 K C
3 B D
4 D A
5 F K


The Output I am trying to get would be like this

A B C
1 A E A , E
2 K C C , K
3 B D B , D
4 D A A , D
5 F K F , K


So far I have tried this :

lapply(df,FUN=paste(sort(df$A,df$B),collapse=" , "))
mapply(FUN= function(x,y)paste(sort(x,y),collapse=" , "),df$A,df$B)


Here I am trying to sort column values and paste them using
','
to create a unique pair name.

Any help is appreciated.

Answer

You can do it with mapply, but since your data are factors, you need to coerce to character to they sort properly:

df$C <- mapply(function(x, y){paste(sort(c(as.character(x), as.character(y))), 
                                    collapse = ',')}, df$A, df$B)
df
#   A B   C
# 1 A E A,E
# 2 K C C,K
# 3 B D B,D
# 4 D A A,D
# 5 F K F,K

To simplify a bit, you can just use apply to iterate over the rows:

apply(df, 1, function(x){paste(sort(x), collapse = ',')})

Since it treats df as a matrix, it converts everything to character, which happens to be what you want for the sample data.

Also see tidyr::unite for pasting two columns together, though it can't easily sort.

Comments