Sam Brightman - 6 months ago 47

R Question

I would like to mutate a data frame twice, grouping by two sets of columns which intersect each other. i.e.:

`df <- df %>% group_by(a, b) %>% mutate(x = sum(d))`

df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e))

Is there a faster/more elegant way to do this? I was hoping to be able to do something like:

`df <- df %>%`

group_by(a, b) %>%

mutate(x = sum(d)) %>%

group_by(c) %>%

mutate(y = sum(e))

Or perhaps save a variable with the first

`group_by`

Answer

We use `add=TRUE`

in the second `group_by`

to group by 3 variables, adding `c`

in the OP's example-

```
df %>%
group_by(a, b) %>%
mutate(x = sum(d)) %>%
group_by(c, add=TRUE) %>%
mutate(y = sum(e))
```

According to the documentation for `?group_by`

By default, when add = FALSE, group_by will override existing groups. To instead add to the existing groups, use add = TRUE

This can be done in one `group_by`

call, but only with non-dplyrish functions:

```
df %>%
group_by(a, b) %>%
mutate(x = sum(d), y = ave(e, c, sum))
```