Samuel Howard Samuel Howard - 3 months ago 24
R Question

Using dplyr summarise in R with dynamic variable

I am trying to use summarise and group by from dplyr in R however when I use a variable in place of explicitly calling the summarized column it uses the sum of dist for the entire data set for each row rather then grouping properly. This can easily be seen in the difference between TestBad and TestGood below. I just want to be able to replicate TestGood's results using the GraphVar variable as in TestBad.

require("dplyr")
GraphVar <- "dist"

TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())

testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())


Thanks!

Answer

You'll need the standard evaluation function summarise_ along with lazyeval::interp.

library(lazyeval)
cars %>%
    group_by_("speed") %>%
    summarise_(Sum = interp(~sum(var, na.rm = TRUE), var = as.name(GraphVar)), 
             Count = ~n())