Tobias van Elferen Tobias van Elferen -4 years ago 257
R Question

Using summarise with weighted mean from dplyr in R

I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way:

require(dplyr)
df<-df%>%
group_by(pagename)%>%
summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews))


But this returns:

Error: 'x' and 'w' must have the same length


My dataset does not have any NA's in the both the page views and the bounce rates.
I'm not sure what I'm doing wrong, maybe
summarise()
doesn't work with
weighted.mean()
?

EDIT

I've added some data:

### Source: local data frame [4 x 3]

### pagename bounceRate pageviews
(chr) (dbl) (dbl)
###1 url1 72.22222 1176
###2 url2 46.42857 733
###3 url2 76.92308 457
###4 url3 62.06897 601

Answer Source

The sumamrize() command replaces variables in the order they appear in the command, so because you are changing the value of pagewviews, that new value is being used in the weighted.mean. It's safer to use different names

df %>%
   group_by(pagename)%>%
   summarise(pageviews_sum=sum(pageviews), 
      bounceRate_mean= weighted.mean(bounceRate,pageviews))

And if you really want, you can rename afterward

df %>%
   group_by(pagename)%>%
   summarise(pageviews_sum=sum(pageviews), 
      bounceRate_mean= weighted.mean(bounceRate,pageviews)) %>% 
   rename(pageviews=pageviews_sum, bounceRate=bounceRate_mean)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download