thisisrg thisisrg - 11 days ago 7
R Question

Conditional subsetting of a data frame R

Let the data frame be:

set.seed(123)
df<-data.frame(name=sample(LETTERS,260,replace=TRUE),
hobby=rep(c("outdoor","indoor"),260),chess=rnorm(1:10))


and the condition which I will use to extract from df be:

df_cond<-df %>% group_by(name,hobby) %>%
summarize(count=n()) %>%
mutate(sum.var=sum(count),sum.name=length(name)) %>%
filter(sum.name==2) %>%
mutate(min.var=min(count)) %>%
mutate(use=ifelse(min.var==count,"yes","no")) %>%
filter(grepl("yes",use))


I want to randomly extract the rows from
df
that correspond to the (name,hobby,count) combination in
df_cond
along with the rest of
df
. I am having bit of a trouble combining
%in%
and
sample
.Thanks for any clue!

Edit: For example:

head(df_cond)
name hobby count sum.var sum.name min.var use
<fctr> <fctr> <int> <int> <int> <int> <chr>
1 A indoor 2 6 2 2 yes
2 B indoor 8 16 2 8 yes
3 B outdoor 8 16 2 8 yes
4 C outdoor 6 14 2 6 yes
5 D indoor 10 24 2 10 yes
6 E outdoor 8 18 2 8 yes


Using the above data frame, I want to randomly extract 2 rows (=count) with the combination A+indoor(row1) from
df
,
8 rows with the combination B+indoor (row 2) from
df
....and so on.

Answer

If I understand correctly, you could use purrr to achieve what you want:

df_cond %>% 
  mutate(data = map2(name, hobby, function(x, y) {filter(df, name == x, hobby == y)})) %>% 
  mutate(data = map2(data, count, function(x, y) sample_n(x, size = y))) 

And if you want the same form as df:

df_cond %>% 
  mutate(data = map2(name, hobby, function(x, y) {df %>% filter(name == x, hobby == y)})) %>% 
  mutate(data = map2(data, count, function(x, y) sample_n(x, size = y))) %>% 
  ungroup() %>% 
  select(data) %>% 
  unnest()