Pro - 1 year ago 93
Python Question

# Using count() in python

I need to get the top 5 movies as ordered by average rating, with ignoring any movies with less than 50 ratings. Can i do that using count ()

The code:

``````import numpy as np
import pandas as pd
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
'http://files.grouplens.org/datasets/movielens/ml-100k/u.data',
sep='\t', names=r_cols)
grouped_data = ratings['rating'].groupby(ratings['movie_id'])
## average and combine
average_ratings = grouped_data.mean()
print ("Average ratings:")
``````

Many ways to skin a cat as often in pandas, here are a couple:

1.Apply several functions to the groupby

Apply both mean and count to the groupby:

``````In [1]: df= ratings['rating'].groupby(ratings['movie_id']).agg(['mean', 'count'])
Out[1]:
mean     count
movie_id
1          3.878319     452
2          3.206107     131
3          3.033333     90
``````

Then you can filter it and return the 5 largest:

``````In [2]: df.ix[(df['count'] >= 50), 'mean'].nlargest(5)

Out[2]:
movie_id
408    4.491071
318    4.466443
169    4.466102
483    4.456790
114    4.447761
Name: mean, dtype: float64
``````

2.Use boolean indexing after the fact

This assumes you have executed the entire code of your question, thus `average_ratings` is already existing

``````movie_count = ratings.movie_id.value_counts()
``````ratings.groupby('movie_id').filter(lambda x: len(x) > 50).groupby('movie_id')['rating'].mean().sort_values(ascending=False).head(5)