Morpheu5 - 1 year ago 61
R Question

# How do I sample single (random) rows that can be grouped by a column's values?

Here is a sample of the data

``````p <- structure(list(name = structure(1:5, .Label = c("Alice", "Bob",
"Charlie", "Dennis", "Earl"), class = "factor"), cohort = structure(c(3L,
3L, 2L, 2L, 1L), .Label = c("X", "Y", "Z"), class = "factor"),
group = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("A",
"B"), class = "factor"), var = c(1L, 2L, 1L, 3L, 4L)), .Names = c("name",
"cohort", "group", "var"), class = "data.frame", row.names = c(NA,
-5L))
``````

that looks like

``````     name cohort group var
1   Alice      Z     A   1
2     Bob      Z     A   2
3 Charlie      Y     B   1
4  Dennis      Y     B   3
5    Earl      X     A   4
``````

and I need something like the following, based on the
`cohort`
column. I need to sample one row in each
`cohort`
(possibly randomly) so that I don't have multiple people belonging to the same
`cohort`
.

``````     name cohort group var
2     Bob      Z     A   2
3 Charlie      Y     B   1
5    Earl      X     A   4
``````

I can
`group_by`
cohort, but then I'm not sure how to proceed to create a new data frame with only the rows that I need.

Answer Source

You can group by `cohort` and pipe it to `sample_n` where 1 indicates that you want one sample per group

``````library(dplyr)

p %>% group_by(cohort) %>% sample_n(1)

Source: local data frame [3 x 4]
Groups: cohort [3]

name cohort  group   var
(fctr) (fctr) (fctr) (int)
1   Earl      X      A     4
2 Dennis      Y      B     3
3  Alice      Z      A     1
``````

Second run:

`````` name cohort  group   var
(fctr) (fctr) (fctr) (int)
1    Earl      X      A     4
2 Charlie      Y      B     1
3     Bob      Z      A     2
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download