FiofanS FiofanS - 1 month ago 4
R Question

How to calculate average time for aggregated data per different groups?

I have the following data frame and this question relates to [this thread]

df = data.frame(c("2012","2012","2012","2013"),
c("AAA","BBB","AAA","AAA"),
c("X","Not-serviced","X","Y"),
c("2","10","3","2.5"))

colnames(df) = c("year","type","service_type","waiting_time")


I want to get average waiting times for serviced and nonserviced groups. This is how the data is grouped:

library(data.table)
setDT(df)[, .(num_serviced = sum(service_type != "Not-serviced"),
num_notserviced = sum(service_type =="Not_serviced"),
avg_wt = mean(waiting_time)), ## THE PROBLEM HERE!!!
.(year, type)][, Total := num_serviced + num_notserviced][]


However
avg_wt = mean(waiting_time))
estimated average waiting time over Total. I would rather need
avg_wt_serviced
and
avg_wt_notserviced
.

The result must be:

year type num_serviced num_notserviced num_total avg_wt_serviced avg_wt_notserviced
2012 AAA 2 0 2 2.5 0

Answer

With dplyr, we can use mean

library(dplyr)
df %>%
   group_by(year,type) %>%
   summarise(num_serviced = sum(service_type != "Not-serviced"), 
             num_notserviced = sum(service_type == "Not-serviced"),
             num_total = num_serviced + num_notserviced, 
             avg_wt_serv = mean(waiting_time[service_type != "Not-serviced"]),
             avg_wt_notser = mean(waiting_time[service_type == "Not-serviced"]))


#   year  type num_serviced num_notserviced num_total avg_wt_serv  avg_wt_notser
#   <fctr> <fctr>   <int>           <int>     <int>      <dbl>         <dbl>
#1   2012    AAA       2               0         2        2.5            NaN
#2   2012    BBB       0               1         1        NaN            10
#3   2013    AAA       1               0         1        2.5            NaN
Comments