lukeA lukeA - 1 month ago 6
R Question

How to combine magrittr pipes and %in% inside a dplyr::filter predicate function?

Given the input data frame

library(dplyr)
( df <- data_frame(id = c(1,1,1,2,2,3), y = letters[1:6]) )
# # A tibble: 6 × 2
# id y
# <dbl> <chr>
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e
# 6 3 f


Assume one wants to get a subset of
df[, c("id", "y")]
only with the two most common ids, which are
id
1
and
2
:

df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id %>% print -> ids #*
# [1] 1 2


Question: Is there a way to use a pipe in a predicate function inside
filter
in the veins of:

df %>% filter(
id %in% df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id )
# Error: no applicable method for 'group_by_' applied to an object of class "logical"

df %>% filter(
id %in% (df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id) )
# Error: cannot handle

df %>% filter(
id %in% {df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id} )
# Error: cannot handle


?

I mean, the last two predicates seem to work as expected outside of
filter
:

df$id %in% (df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id)
# [1] TRUE TRUE TRUE TRUE TRUE FALSE
df$id %in% {df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id}
# [1] TRUE TRUE TRUE TRUE TRUE FALSE





Side note: I know I could use a temporary variable
ids
:

df %>% filter(id %in% ids) # *ids <- c(1,2)


or I could use
*_join
:

df %>% inner_join(
df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% select(-n))


Both yield the expected output:

# # A tibble: 5 × 2
# id y
# <dbl> <chr>
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e

Answer

Don't make this complicated for its own sake.

ids <- (df %>% count(id) %>% arrange(n) %>% tail(2))$id
filter(df, id %in% ids)