Daniel Daniel - 3 months ago 20
R Question

Summarizing multiple columns with dplyr?

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R.

df <- data.frame(a=sample(1:5, 10, replace=T),
b=sample(1:5, 10, replace=T),
c=sample(1:5, 10, replace=T),
d=sample(1:5, 10, replace=T),
grp=sample(1:3, 10, replace=T))
df %>% group_by(grp) %>% summarise(mean(a))


This gives me the mean for column "a" for each group indicated by "grp".

My question is: is it possible to get the means for each column within each group at once? Or do I have to repeat
df %>% group_by(grp) %>% summarise(mean(a))
for each column?

What I would like to have is something like

df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work

Answer

dplyr 0.2 contains summarise_each for this aim:

df %>% group_by(grp) %>% summarise_each(funs(mean))
#> Source: local data frame [3 x 5]
#> 
#>     grp        a        b        c        d
#>   (int)    (dbl)    (dbl)    (dbl)    (dbl)
#> 1     1 3.000000 2.666667 2.666667 3.333333
#> 2     2 2.666667 2.666667 2.500000 2.833333
#> 3     3 4.000000 1.000000 4.000000 3.000000

Alternatively, the purrr package provides the same functionality:

df %>% slice_rows("grp") %>% dmap(mean)
#> Source: local data frame [3 x 5]
#> 
#>     grp        a        b        c        d
#>   (int)    (dbl)    (dbl)    (dbl)    (dbl)
#> 1     1 3.000000 2.666667 2.666667 3.333333
#> 2     2 2.666667 2.666667 2.500000 2.833333
#> 3     3 4.000000 1.000000 4.000000 3.000000
Comments