Pro Pro - 1 month ago 4
Python Question

Using count() in python

I need to get the top 5 movies as ordered by average rating, with ignoring any movies with less than 50 ratings. Can i do that using count ()

The code:

import numpy as np
import pandas as pd
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(
'http://files.grouplens.org/datasets/movielens/ml-100k/u.data',
sep='\t', names=r_cols)
ratings.head()
grouped_data = ratings['rating'].groupby(ratings['movie_id'])
## average and combine
average_ratings = grouped_data.mean()
print ("Average ratings:")
print (average_ratings.head())

Answer

Many ways to skin a cat as often in pandas, here are a couple:

1.Apply several functions to the groupby

Apply both mean and count to the groupby:

In [1]: df= ratings['rating'].groupby(ratings['movie_id']).agg(['mean', 'count'])
        df.head(3)
Out[1]: 
           mean     count
movie_id        
1          3.878319     452
2          3.206107     131
3          3.033333     90

Then you can filter it and return the 5 largest:

In [2]: df.ix[(df['count'] >= 50), 'mean'].nlargest(5)

Out[2]:
movie_id
408    4.491071
318    4.466443
169    4.466102
483    4.456790
114    4.447761
Name: mean, dtype: float64

2.Use boolean indexing after the fact

This assumes you have executed the entire code of your question, thus average_ratings is already existing

movie_count = ratings.movie_id.value_counts()
higher_than_50_votes = movie_count.index[movie_count > 50]
# Apply that to your average_ratings, sort, and return
average_ratings.ix[higher_than_50_votes].sort_values(ascending=False).head(5)

3. Using groupby.filter

ratings.groupby('movie_id').filter(lambda x: len(x) > 50).groupby('movie_id')['rating'].mean().sort_values(ascending=False).head(5)