Rilcon42 Rilcon42 - 4 years ago 107
R Question

dplyr returns global mean for each group, instead of each groups mean

Can someone explain what I am doing wrong here:

library(dplyr)
temp<-data.frame(a=c(1,2,3,1,2,3,1,2,3),b=c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))

# A tibble: 3 × 3
`temp[, 1]` n mean
<dbl> <int> <dbl>
1 1 3 2
2 2 3 2
3 3 3 2


I expected the means to be:

1 1
2 2
3 3


instead the mean seems to be the global mean (all values in col 2 divided by the number of instances) = 18/9=2

How do I get the mean to be what I expected?

Answer Source

Your problem is that you are calculating the mean of temp[,2]instead of the column in the group (mean(temp[,2],na.rm=T) does not depend on the context at all). You need to do the following:

> temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(b,na.rm=T))
# A tibble: 3 × 3
  `temp[, 1]`     n  mean
        <dbl> <int> <dbl>
1           1     3     1
2           2     3     2
3           3     3     3
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download