jenswirf - 1 year ago 1735
R Question

# Relative frequencies / proportions with dplyr

Suppose I want to calculate the proportion of different values within each group. For example, using the

`mtcars`
data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with
`dplyr`
?

``````library(dplyr)
data(mtcars)
mtcars = tbl_dt(mtcars)

# calculate frequency
mtcars %>%
group_by (am, gear) %>%
summarise (n=n())

# am gear  n
#  0    3 15
#  0    4  4
#  1    4  8
#  1    5  5
``````

What I would like to achieve (prettified):

``````am gear  n rel.freq
0    3 15      79%
0    4  4      21%
1    4  8      62%
1    5  5      38%
``````

EDIT:

For completeness I'll post my not-so-pretty attempt using the
`data.table`
special function
`.N`
..

``````mtcars %>%
group_by (am) %>%
mutate (total = .N) %>%
group_by (am, gear, total) %>%
summarise (n=n()) %>%
mutate (rel.freq = n / total)
``````

Try this:

``````mtcars %>%
group_by(am, gear) %>%
summarise (n = n()) %>%
mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154
``````

From the dplyr vignette: "When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset". Thus, after the `summarise`, the grouping variable 'gear' is peeled off, and the data is then grouped 'only' by 'am' (just check it with `groups` on the resulting data), on which we then perform the `mutate` calculation.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download