Latrunculia Latrunculia - 1 month ago 6
R Question

Is there a way to sample groups of data in a grouped data frame (dplyr)

Say I have a data frame that is grouped by 2 factors. Is there a way to sample groups of data with

dplyr
? (note: not sample within groups)

example:

DF <- data.frame(A = rep(LETTERS[1:4], each = 6),
B = rep(c(1:2), 12),
C = rnorm(24))

# base r solution

DF$group_var <- paste(DF$A, DF$B, sep = "_")
DF_sample <- DF[DF$group_var %in% sample(unique(DF$group_var), 3), ]

#possible dplyr solution?

DF_sample <- DF %>% group_by(A,B) %>% sample_group_of_data(3)

Answer

Here's another pipe-solution, it works irrespective of whether the data is grouped or not:

DF %>% split(interaction(.$A, .$B)) %>% sample(3) %>% bind_rows()
# Source: local data frame [9 x 3]
# 
#       A     B          C
#   (fctr) (int)      (dbl)
# 1      B     1  0.2623781
# 2      B     1 -0.8193225
# 3      B     1  0.3348400
# 4      D     1  1.0744650
# 5      D     1  1.3528529
# 6      D     1  0.3016770
# 7      A     2 -0.1920754
# 8      A     2  0.6917583
# 9      A     2  0.1985326

The pipe itself is pretty self-explanatory, I believe.