David Mas David Mas - 1 month ago 14
R Question

Can I subsample different sizes per group with dplyr?

Okay, so I know I could do something like this,

mtcars %>%
group_by(cyl) %>%
sample_n(2)


which will give me,

Source: local data frame [6 x 11]
Groups: cyl [3]

mpg cyl disp hp drat wt qsec vs am
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21.4 4 121.0 109 4.11 2.780 18.60 1 1
2 33.9 4 71.1 65 4.22 1.835 19.90 1 1
3 18.1 6 225.0 105 2.76 3.460 20.22 1 0
4 21.0 6 160.0 110 3.90 2.875 17.02 0 1
5 15.2 8 304.0 150 3.15 3.435 17.30 0 0
6 10.4 8 460.0 215 3.00 5.424 17.82 0 0
# ... with 2 more variables: gear <dbl>, carb <dbl>


so 2 samples per cylinder. This looks cool. However, there is a way to set a vector of sizes matching unique elements of the grouping feature so I can get n = 1 for cars with 4 cylinder, n=10 for cars with 6 cyl and so on?

Thanks!

Answer

Do each individually and then bind them together. I assume you're already in dplyr:

bind_rows(
  mtcars %>% 
    group_by(cyl) %>%
    filter(cyl==4) %>%
    sample_n(1),
  mtcars %>% 
    group_by(cyl) %>%
    filter(cyl==6) %>%
    sample_n(6))

We can't do 10 rows of cyl==6 because there's only 6 ;)

Comments