Neil Neil - 1 year ago 95
R Question

group by in dplyr and calculating percentages

I have following dataframe in r

Service Container_Pick_Day
ABC 0
ABC 1
ABC 1
ABC 2
ABC NA
ABC 0
ABC 1
DEF NA
DEF 0
DEF 1
DEF 1
DEF 1
DEF 2
DEF 1


Column
Container_Pick_Day
is numeric and consist of
NA
values.
What I want to do is calculate
Service
wise percentage of containers picked up on
0th day,after 1 day,2 day and so on
ignoring
NA
values

Desired dataframe would be

Service Container_Pick_Day Percentage
ABC 0 (2/6)*100 = 33.33
ABC 1 (3/6)*100 = 50
ABC 2 (1/6)*100 = 16.67
DEF 0 (1/6)*100 = 16.67
DEF 1 (3/6)*100 = 50
DEF 2 (1/6)*100 = 16.67


I did following in R,but its generating NA values in output

df%>%
group_by(Service) %>%
summarise(pick_day_perc = n()/sum(Container_Pick_Day),na.rm=T) %>%
as.data.frame()


Do I have to group by
Service and Container_Pick_Day
both ?

Answer Source

Adding an answer based on all comments above provided by @nicola, @akrun and myself,

library(dplyr)

#nicola
df %>% 
 filter(!is.na(Container_Pick_Day)) %>% 
 group_by(Service,Container_Pick_Day) %>% 
 summarise(Percentage=n()) %>% 
 group_by(Service) %>% 
 mutate(Percentage=Percentage/sum(Percentage)*100)

#akrun
df %>% 
 filter(complete.cases(Container_Pick_Day)) %>% 
 count(Service, Container_Pick_Day) %>% 
 group_by(Service) %>% 
 transmute(Container_Pick_Day, Percentage=n/sum(n)*100)

#Sotos
df %>% 
 na.omit() %>% 
 group_by_all() %>% 
 summarise(ptg = n()) %>% 
 group_by(Service) %>% 
 mutate(ptg = prop.table(ptg)*100)

All resulting to,

Service Container_Pick_Day Percentage
   <fctr>              <int>      <dbl>
1     ABC                  0   33.33333
2     ABC                  1   50.00000
3     ABC                  2   16.66667
4     DEF                  0   16.66667
5     DEF                  1   66.66667
6     DEF                  2   16.66667
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download