 Rilcon42 - 4 years ago 107
R Question

# dplyr returns global mean for each group, instead of each groups mean

Can someone explain what I am doing wrong here:

``````library(dplyr)
temp<-data.frame(a=c(1,2,3,1,2,3,1,2,3),b=c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))

# A tibble: 3 × 3
`temp[, 1]`     n  mean
<dbl> <int> <dbl>
1           1     3     2
2           2     3     2
3           3     3     2
``````

I expected the means to be:

``````1  1
2  2
3  3
``````

instead the mean seems to be the global mean (all values in col 2 divided by the number of instances) = 18/9=2

How do I get the mean to be what I expected? Jonathan von Schroeder
Answer Source

Your problem is that you are calculating the mean of `temp[,2]`instead of the column in the group (`mean(temp[,2],na.rm=T)` does not depend on the context at all). You need to do the following:

``````> temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(b,na.rm=T))
# A tibble: 3 × 3
`temp[, 1]`     n  mean
<dbl> <int> <dbl>
1           1     3     1
2           2     3     2
3           3     3     3
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download