Jesse Jesse - 1 year ago 164
Python Question

For Loop to determine weighted average python

I'm new with Python and am having trouble crafting the correct for loop for a situation.

I have a dataframe

that contains two columns: a restaurant star rating
and total number of reviews

I want to find weighted averages for these star ratings (Star_Rating * (Review_Count / total number of reviews)) and add them to a new column called

Here's what I have so far along with notes of what I think I'm doing with each step:

#get total number of reviews
totalreviews = dfclean.Review_Count.sum()

#create empty list to append values to
weightedavg = []

#for loop
for row in range(len(dfclean)):
weightedavg.append(dfclean.Star_Rating[row] * (dfclean.Review_Count[row] / totalreviews))

#make a new column in df consisting of weightedavg
dfclean['weightedavg'] = weightedavg

Any help would be greatly appreciated!

Answer Source

You shouldn't use a for loop. You can take advantage of broadcasting to do something the following:

dfclean['weightedavg'] = dfclean['Star_Rating'] * dfclean['Review_Count'] / dfclean['Review_Count'].sum()

This is much faster than using a Python loop and is also syntactically cleaner. You can read about broadcasting in the numpy docs and the pandas docs.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download