KDA KDA - 7 days ago 5
R Question

Best way to generate a stratified random sequence in R?

I would like to create a workout schedule in R by generating a sequence of workouts from those contained in a large dataframe (rows are workouts, 3 columns contain features including workout name, category and duration). 'Category' takes values A through N, and there are unequal numbers of workouts in each category. I would like the sequence to be generated as follows:
1. Draw a random workout from category A, then B, then C, through N to start the sequence.
2. Continue generating the sequence by repeating (1), each time drawing without replacement from Categories A through N.
3. When all workouts from any category (e.g., A) have been drawn, 'refill' that category and start drawing without replacement again.
4. Continue until all workouts have been used at least once.
5. Output the constructed sequence but retain all of the original information (i.e., all 3 columns) for each workout (including the repeats).

Thanks for your help with this very important problem :)

Thank you, Jim, for your sample dataset and your response. This is a good representative fake dataset (with fewer than actual categories for simplicity):

set.seed(1)
dat <- data.frame(workout = sample(1:200), category = sample(c('A','B','C'),200,T))

head(dat)
# workout category
# 1 25 C
# 2 14 C
# 3 191 C
# 4 88 C
# 5 73 B
# 6 34 B


However, I should have specified that I want only one workout per day, so each output row would represent a day in the workout schedule. Hoping the output will look something like this:

head(dat)
#Day workout category
#1 8 A
#2 73 B
#3 88 C
#4 4 A

Answer
dat <- data.frame(workout = sample(1:15), category = sample(c('A','B','C'),15,T))
dat
#   workout category
#1       14        A
#2        1        B
#3       11        A
#4        9        B
#5       13        A
#6       12        C
#7        6        B
#8        8        C
#9        3        C
#10      15        A
#11       4        C
#12       7        B
#13       5        A
#14      10        A
#15       2        B
cats <- list()
cats[[1]] <- which(dat$category=='A')
cats[[2]] <- which(dat$category=='B')
cats[[3]] <- which(dat$category=='C')
lens <- sapply(cats,length)
m <- max(lens)
days <- matrix(0,m,3)
for(i in 1:3){
    if(lens[i]==m) days[,i] <- sample(cats[[i]])
    else days[,i] <- c(sample(cats[[i]]),sample(cats[[i]],m-lens[i]))
}

Then the appropriately reordered dataset is

    dat[c(t(days)),]
#     workout category
#13         5        A
#7          6        B
#6         12        C
#5         13        A
#2          1        B
#8          8        C
#14        10        A
#12         7        B
#11         4        C
#10        15        A
#4          9        B
#9          3        C
#1         14        A
#15         2        B
#11.1       4        C
#3         11        A
#2.1        1        B
#6.1       12        C
Comments