matteo matteo - 3 months ago 15
R Question

take randomly sample based on groups

I have a df made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). df looks like:

ID Year Temp ph
1 P1 1996 11.3 6.80
2 P1 1996 9.7 6.90
3 P1 1997 9.8 7.10
...
2000 P2 1997 10.5 6.90
2001 P2 1997 9.9 7.00
2002 P2 1997 10.0 6.93


I want to take 500 random rows for every ID (so 500 for P1, 500 for P2,....) and create a new df. I try:

new_df<-df[df$ID %in% sample(unique(dfID),500),]


But it takes randomly one ID, while I need 500 random rows for every ID.

Answer

Try this:

library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])