Pro - 9 months ago 46

Python Question

I need to get the top 5 movies as ordered by average rating, with ignoring any movies with less than 50 ratings. Can i do that using count ()

The code:

`import numpy as np`

import pandas as pd

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']

ratings = pd.read_csv(

'http://files.grouplens.org/datasets/movielens/ml-100k/u.data',

sep='\t', names=r_cols)

ratings.head()

grouped_data = ratings['rating'].groupby(ratings['movie_id'])

## average and combine

average_ratings = grouped_data.mean()

print ("Average ratings:")

print (average_ratings.head())

Answer Source

Many ways to skin a cat as often in pandas, here are a couple:

**1.Apply several functions to the groupby**

Apply both mean and count to the groupby:

```
In [1]: df= ratings['rating'].groupby(ratings['movie_id']).agg(['mean', 'count'])
df.head(3)
Out[1]:
mean count
movie_id
1 3.878319 452
2 3.206107 131
3 3.033333 90
```

Then you can filter it and return the 5 largest:

```
In [2]: df.ix[(df['count'] >= 50), 'mean'].nlargest(5)
Out[2]:
movie_id
408 4.491071
318 4.466443
169 4.466102
483 4.456790
114 4.447761
Name: mean, dtype: float64
```

**2.Use boolean indexing after the fact**

This assumes you have executed the entire code of your question, thus `average_ratings`

is already existing

```
movie_count = ratings.movie_id.value_counts()
higher_than_50_votes = movie_count.index[movie_count > 50]
# Apply that to your average_ratings, sort, and return
average_ratings.ix[higher_than_50_votes].sort_values(ascending=False).head(5)
```

**3. Using groupby.filter**

```
ratings.groupby('movie_id').filter(lambda x: len(x) > 50).groupby('movie_id')['rating'].mean().sort_values(ascending=False).head(5)
```