Pro Pro - 1 year ago 77
Python Question

Using count() in python

I need to get the top 5 movies as ordered by average rating, with ignoring any movies with less than 50 ratings. Can i do that using count ()

The code:

import numpy as np
import pandas as pd
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(
sep='\t', names=r_cols)
grouped_data = ratings['rating'].groupby(ratings['movie_id'])
## average and combine
average_ratings = grouped_data.mean()
print ("Average ratings:")
print (average_ratings.head())

Answer Source

Many ways to skin a cat as often in pandas, here are a couple:

1.Apply several functions to the groupby

Apply both mean and count to the groupby:

In [1]: df= ratings['rating'].groupby(ratings['movie_id']).agg(['mean', 'count'])
           mean     count
1          3.878319     452
2          3.206107     131
3          3.033333     90

Then you can filter it and return the 5 largest:

In [2]: df.ix[(df['count'] >= 50), 'mean'].nlargest(5)

408    4.491071
318    4.466443
169    4.466102
483    4.456790
114    4.447761
Name: mean, dtype: float64

2.Use boolean indexing after the fact

This assumes you have executed the entire code of your question, thus average_ratings is already existing

movie_count = ratings.movie_id.value_counts()
higher_than_50_votes = movie_count.index[movie_count > 50]
# Apply that to your average_ratings, sort, and return

3. Using groupby.filter

ratings.groupby('movie_id').filter(lambda x: len(x) > 50).groupby('movie_id')['rating'].mean().sort_values(ascending=False).head(5)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download