I am following H. Wickham's R for Data Science and could not make snippet of code from that book work.
I refer to this section and the following graph of the book.
I literally copied and pasted the part of the code from the book, but it does not work as expected.
by_age <- gss_cat %>%
group_by(age, marital) %>%
mutate(prop = n / sum(n))
ggplot(by_age, aes(age, prop, color = marital)) +
geom_line(na.rm = TRUE)
This looks to be a rather simple issue with the code. Yes, it should probably be fixed by Hadley and co but its not a big deal.
If you strat by printing
by_age in the console you should see:
# A tibble: 351 x 4 # Groups: age, marital 
So, the tibble is grouped by both
marital. This means that both
count() and the subsequent
sum(n) (within the
mutate) return the same value since
sum is only being calculated over the group with only one value i.e.
sum(n) == n -->
prop === 1.
You were on the right track with an
ungroup() however, the desired calculation is the proportion of each marital status for each age. So, add a
group(age) between the
mutate and you are golden.
by_age <- gss_cat %>% filter(!is.na(age)) %>% group_by(age, marital) %>% count() %>% group(age) %>% mutate(prop = n / sum(n)) ggplot(by_age, aes(age, prop, color = marital)) + geom_line(na.rm = TRUE)