Alex Coppock Alex Coppock - 4 days ago 6
R Question

dplyr::group_by() with multiple variables but NOT intersection

When you

group_by
multiple variables,
dplyr
helpfully finds the intersection of those groups.

For example,

mtcars %>%
group_by(cyl, am) %>%
summarise(mean(disp))


yields

Source: local data frame [6 x 3]
Groups: cyl [?]

cyl am `mean(disp)`
<dbl> <dbl> <dbl>
1 4 0 135.8667
2 4 1 93.6125
3 6 0 204.5500
4 6 1 155.0000
5 8 0 357.6167
6 8 1 326.0000


My question is, is there a way to provide multiple variables, but to summarize marginally? I want output like what you get if you do this by hand, variable by variable.

df_1 <-
mtcars %>%
group_by(cyl) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("cyl_", cyl), est)

df_2 <-
mtcars %>%
group_by(am) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("am_", am), est)

bind_rows(df_1, df_2)


The above code yields

# A tibble: 5 × 2
group est
<chr> <dbl>
1 cyl_4 105.1364
2 cyl_6 183.3143
3 cyl_8 353.1000
4 am_0 290.3789
5 am_1 143.5308


ideally, the syntax would be something like

mtcars %>%
group_by(cyl, am, intersection = FALSE) %>%
summarise(est = mean(disp))


Does something like this exist in the
tidyverse
?

(p.s., I get that my
group
variable in the table above isn't tidy in the sense that it contains two variables in one, but I promise for my purpose it's tidy, OK? :) )

Answer

I'm guessing what you're looking for is the tidy package...

Something like this should do the trick:

library(dplyr)
library(tidyr)

mtcars %>%
  gather(col, value, cyl, am) %>% 
  mutate(group = paste(col, value, sep = "_)) %>%
  group_by(group) %>% 
  summarise(est = mean(disp))
Comments