giacomoV giacomoV - 3 months ago 7
R Question

R - paste complex stylised values from list

I am working with severals lists, and each list contains a large number of data frames. Each data frame contains 3 variables (

cluster
,
grp
,
value
), such as (example of 1 list)

$`0`
Source: local data frame [1 x 3]

cluster grp value
(int) (int) (chr)
1 1 0 c Personal Care-277

$`1`
Source: local data frame [1 x 3]

cluster grp value
(int) (int) (chr)
1 1 1 b Unpaid-1

$`2`
Source: local data frame [1 x 3]

cluster grp value
(int) (int) (chr)
1 1 2 c Personal Care-1


What I would like is to summarise these informations in a vector in order to analyse them easily [output wanted] :

cluster 1 : (c Personal Care-277) - (b Unpaid-1) - (c Personal Care-1)


What I have tried to do is the following :

library(plyr)
library(dplyr)


1) I first merged all the data frame together by
cluster
. I choose to use
join_all
which seems to work fine for the job, except the strange
colname
output.

dt1 = dt %>% lapply(fgr) %>%
join_all(by = 'cluster') %>%
`colnames<-`(c("cluster", paste('t', 1:3, sep = '')))


2) Then I used
paste
to put the values in a stylised fashion together

dt1 %>%
mutate(print = paste('cluster: ', cluster, ' (' , t1, ')', '(', t2 , ')', '(', t3 , ')', sep="") ) %>%
select(print)

# print
# 1 cluster: 1 (c Personal Care-277)(b Unpaid-1)(c Personal Care-1)


The problem is that I have many different lists with many dataframes and some dataframes have unequal
length
. Here the list in example has 3 elements
t1
t2
t3
(plus the
cluster
). But some list have dataframes with 4 or more elements.

Questions

I wanted to know first if there was a way to automate this
paste
, in order to avoid writing
t1
,
t2
, and so on by hand and secondly if you had any better idea for a routine than the one I showed here.

Thanks

The data (list)

dt = list(structure(list(cluster = structure(1L, .Label = "1", class = "factor"),
grp = structure(1L, .Label = "0", class = "factor"), value = structure(1L, .Label = "c Personal Care-277", class = "factor")), .Names = c("cluster",
"grp", "value"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(cluster = structure(1L, .Label = "1", class = "factor"),
grp = structure(1L, .Label = "1", class = "factor"),
value = structure(1L, .Label = "b Unpaid-1", class = "factor")), .Names = c("cluster",
"grp", "value"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(cluster = structure(1L, .Label = "1", class = "factor"),
grp = structure(1L, .Label = "2", class = "factor"),
value = structure(1L, .Label = "c Personal Care-1", class = "factor")), .Names = c("cluster",
"grp", "value"), row.names = c(NA, -1L), class = "data.frame"))

Answer

You can try,

library(dplyr)
bind_rows(dt) %>% 
        group_by(cluster) %>% 
        summarise(new = paste0('cluster: ', unique(cluster), ' (', paste(value, collapse = ','), ')')) %>% 
        select(new)

# A tibble: 1 × 1
#                                                            new
#                                                          <chr>
#1 cluster: 1 (c Personal Care-277,b Unpaid-1,c Personal Care-1)