cylondude cylondude - 4 months ago 10
R Question

Filter n rows of grouped data frame when different n for each group

I'd like to pick a different number of rows of each group of my data frame. I haven't figured out an elegant way to do this with dplyr yet. To pick out the same number of rows for each group I accomplish like this:

library(dplyr)

iris %>%
group_by(Species) %>%
arrange(Sepal.Length) %>%
top_n(2)


But I would like to be able to reference another table with the number of rows I'd like for each group, a sample table like this below:

top_rows_desired <- data.frame(Species = unique(iris$Species),
n_desired = c(4,2,5))

Answer

We can do a left_join with 'iris' and 'top_rows_desired' by 'Species', grouped by 'Species', slice the sequence of first 'n_desired' and remove the 'n_desired' column with select.

left_join(iris, top_rows_desired, by = "Species") %>%
                     group_by(Species) %>% 
                     arrange(desc(Sepal.Length)) %>%
                     slice(seq(first(n_desired))) %>%
                     select(-n_desired)