Rilcon42 Rilcon42 - 4 years ago 149
R Question

dplyr returning global mean when columns are specified

I am trying to return the mean for each group, based on this SO post, but the solution doesn't seem to work in this case. Can someone explain why I am still getting a global mean?

tmp = tempfile(fileext = ".xlsx")
download.file(url = "https://www.bls.gov/emp/ind-occ-matrix/occupation.xlsx", destfile = tmp, mode="wb")
library(readxl)
csv <- read_excel(tmp,sheet=8)
########################################################
colnames(csv)<-c("title","code","Occupation Type","Employment2014","Employment2024" ,"EmploymentChange2014-24.Num","EmploymentChange2014-24.Percent","Percent self employed2014","Job openings due to growth and replacements2014-24","Median annual wage2015","Typical education needed for entry","Work experience in a related occupation","Typical on-the-job training needed")
csv<-csv[csv[,3]=="Line item",]
csv$"Median annual wage2015"<-as.numeric(csv$"Median annual wage2015")

library(dplyr)
csv%>%group_by(csv$"Typical education needed for entry")%>%summarise(n=n(),mean=mean(csv$"Median annual wage2015",na.rm=T))

Answer Source

Your dplyr application is not entirely correct. remove csv$ like so. Because you are getting the data for mean out of the context of the dplyr chain, and thus the group_by function.

library(dplyr)
csv %>%  
  group_by(`Typical education needed for entry`) %>% 
  summarise(n=n(), 
    mean=mean(`Median annual wage2015`,na.rm=T))

Also you can make your code more readable (for others) using tabs and enters.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download