Robert Lutener Robert Lutener - 8 days ago 5
R Question

Cannot summarise variables by group in dplyr when passing frames as a function

I am hopeing to use dplyr to pass multimple data frames to a function and then return a data frame with the summarised variables. I am able to do this no problem at the aggregate level, but when I try to group by a factor the function returns the same values for the overall aggregate. Here is an example I have that is working fine:

compCalc <- function(frame,segment) {
newFrame <- frame %>%
summarise(seg = segment,
FTEs = sum(FTEs),
total_TCC = sum(frame$totalCompensationCost),
TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
total_wages = sum(frame$totalWages))
return(newFrame)
}


I then call the function like so:

nuSectorOverall <- compCalc(dfEx, "allNonUnion")


and I get nice output like this:

Overall
seg FTEs total_TCC TCC_per_fte TCC_per_hour total_wages
allNonUnion 3980.559 185865849 46693.4 24.09153 171344280


Now when I introduce a group_by clause into the mix like so:

compCalcEmp <- function(frame,segment) {
newFrame <- frame %>%
group_by(employeeGroup) %>%
summarise(seg = segment,
FTEs = sum(FTEs),
total_TCC = sum(frame$totalCompensationCost),
TCC_per_fte = sum(frame$totalCompensationCost)/sum(frame$FTEs),
TCC_per_hour = sum(frame$totalCompensationCost)/sum(frame$hours),
total_wages = sum(frame$totalWages))
return(newEmpFrame)
}


I run into the following problem:

employeeGroup seg FTEs total_TCC TCC_per_fte TCC_per_hour total_wages total_wages_per_fte
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Bargaining Unit overall 139.2841 185865849 46693.4 24.09153 171344280 43045.28
2 Management & Excluded overall 402.0311 185865849 46693.4 24.09153 171344280 43045.28
3 Non-Union overall 3439.2438 185865849 46693.4 24.09153 171344280 43045.28


As you can see it is calculating the same values for each grouped variable with the exception of FTEs!

I looked long and hard to see if there was a similar question to this one and I apologize if I did not find it. any help would be very much appreciated!

All best,

r

Answer

You don't want to use frame$ to refer to columns of frame inside the dplyr pipe. Try this instead:

compCalcEmp <- function(frame,segment) {
    newFrame <- frame %>% 
        group_by(employeeGroup) %>%
            summarise(seg = segment,
                FTEs = sum(FTEs),
                total_TCC = sum(totalCompensationCost),
                TCC_per_fte = sum(totalCompensationCost)/sum(FTEs),
                TCC_per_hour = sum(totalCompensationCost)/sum(hours),
                total_wages = sum(totalWages))
   return(newFrame)
}

It worked before without the group_by because in that case you are summarizing with respect to the whole frame and not by subset groups.

Comments