How do I use dplyr to create proportions of a level of a factor variable for each state? For example, I'd like to add a variable that indicates the percent of females within each state to the data frame.
# gen data
state <- rep(c(rep("Idaho", 10), rep("Maine", 10)), 2)
student.id <- sample(1:1000,8,replace=T)
gender <- rep( c("Male","Female"), 100*c(0.25,0.75) )
gender <- sample(gender, 40)
school.data <- data.frame(student.id, state, gender)
group_by(state, gender %in%c("Female")) %>%
summarise(count = n()) %>%
mutate(test_count = count)
To add a new column to your existing data frame:
school.data %>% group_by(state) %>% mutate(pct.female = mean(gender == "Female"))
summarize rather than
mutate if you just want one row per state rather than adding a column to the original data.
school.data %>% group_by(state) %>% summarize(pct.female = mean(gender == "Female")) # # A tibble: 2 x 2 # state pct.female # <fctr> <dbl> # 1 Idaho 0.75 # 2 Maine 0.70