Mehdi Farhangian Mehdi Farhangian - 2 months ago 7
R Question

Summarise multiple variables to strings in dplyr

I wish to summarize two variables in string. Let's say this is my id

#visit

id source1 source2
1 a t
2 c l
3 c z
1 b x


second dataset:

#transaction
id transactions

1 1
3 2
1 2


I'd like to join these data together but convert them to string at the same time:

I can do for one variable ( let's say source 1):

library(dplyr)
result <- left_join(visit, transaction, by="id")
result2<- group_by(result, id)
result3<- summarise(result2, Source = toString(unique(source1)), transactions = toString(unique(transactions)) )


This gives me the following output:

id source transactions
1 a,b 1,2
3 c 2
2 c NA


But I wish to summarize for two variables: So my desire output would be something like that:

id source transactions
1 a,t > b,x 1,2
3 c,z 2
2 c,l NA

Answer

You can paste the two variables together, using both sep and collapse to combine:

visit %>% left_join(transaction) %>% 
    group_by(id) %>% 
    summarise(source = paste(unique(source1), unique(source2), sep = ', ', collapse = ' > '), 
              transaction = toString(unique(transactions)))

## # A tibble: 3 × 3
##      id      source transaction
##   <int>       <chr>       <chr>
## 1     1 a, t > b, x        1, 2
## 2     2        c, l          NA
## 3     3        c, z           2

Beware, though; paste and toString stupidly coerce NAs to strings. You may want to wrap in na.omit or use na_if.