Paul Paul - 2 months ago 18
R Question

How to parametrize function calls in dplyr 0.7?

Within the next month dplyr 0.6 is going to be released, including a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.

Here is a common idiom I use when building reporting and aggregation functions with dplyr:

my_report <- function(data, grouping_vars) {
data %>%
group_by_(.dots=grouping_vars) %>%
summarize(x_mean=mean(x), x_median=median(x), ...)
}


Here,
grouping_vars
is a vector of strings.

I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.

However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.

I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.

Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:

library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#> am mean_cyl
#> <dbl> <dbl>
#> 1 0 6.947368
#> 2 1 5.076923

grouping_vars <- "am"
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#> `"am"` mean_cyl
#> <chr> <dbl>
#> 1 am 6.1875

Answer Source

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.