Kevin Burnham Kevin Burnham - 1 month ago 10
R Question

Can different parts of dplyr::summarize() be computed conditionally?

Is it possible to have conditional statements operate on different parts of dplyr::summarize()?

Imagine I am working with the

iris
data and outputting a summary and I want to only include the mean of Sepal.Length when requested. So I could do something like:

data(iris)
include_length = T
if (include_length) {
iris %>%
group_by(Species) %>%
summarize(mean_sepal_width = mean(Sepal.Width), mean_sepal_length = mean(Sepal.Length))
} else {
iris %>%
group_by(Species) %>%
summarize(mean_sepal_width = mean(Sepal.Width))

}


But is there a way to implement the conditional within the pipeline so that it does not need to be duplicated?

Answer

You can use the .dots parameter of dplyr's SE functions to evauluate programmatically, e.g.

library(dplyr)

take_means <- function(include_length){
    iris %>% 
        group_by(Species) %>%
        summarize_(mean_sepal_width = ~mean(Sepal.Width), 
                   .dots = if(include_length){
                       list(mean_sepal_length = ~mean(Sepal.Length))
                   })
}

take_means(TRUE)
#> # A tibble: 3 × 3
#>      Species mean_sepal_width mean_sepal_length
#>       <fctr>            <dbl>             <dbl>
#> 1     setosa            3.428             5.006
#> 2 versicolor            2.770             5.936
#> 3  virginica            2.974             6.588

take_means(FALSE)
#> # A tibble: 3 × 2
#>      Species mean_sepal_width
#>       <fctr>            <dbl>
#> 1     setosa            3.428
#> 2 versicolor            2.770
#> 3  virginica            2.974