Sam Brightman Sam Brightman - 3 months ago 20
R Question

Adding another grouping with dplyr

I would like to mutate a data frame twice, grouping by two sets of columns which intersect each other. i.e.:

df <- df %>% group_by(a, b) %>% mutate(x = sum(d))
df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e))


Is there a faster/more elegant way to do this? I was hoping to be able to do something like:

df <- df %>%
group_by(a, b) %>%
mutate(x = sum(d)) %>%
group_by(c) %>%
mutate(y = sum(e))


Or perhaps save a variable with the first
group_by
applied and then use it twice.

Answer

We use add=TRUE in the second group_by to group by 3 variables, adding c in the OP's example-

 df %>%
   group_by(a, b) %>%
   mutate(x = sum(d)) %>%
   group_by(c, add=TRUE) %>%
   mutate(y = sum(e))

According to the documentation for ?group_by

By default, when add = FALSE, group_by will override existing groups. To instead add to the existing groups, use add = TRUE

This can be done in one group_by call, but only with non-dplyrish functions:

 df %>%
   group_by(a, b) %>%
   mutate(x = sum(d), y = ave(e, c, sum))
Comments