Student - 1 year ago 182
Python Question

# How you calculate the average rating per genre in python?

I have this lens dataframe. It has columns to classify genres a movie belongs to. The genre categories are column names with binary values in the rows. If a movie belongs to a genre, it has a 1 under the appropriate column and 0 otherwise. I want to calculate the average rating per genre for each user in python pandas.

``````# pass in column names for each CSV
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
sep='|',names=u_cols, encoding='latin-1')

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
sep='\t', names=r_cols, encoding='latin-1')

# Reading item file:
m_cols = ['movie_id', 'title' ,'release_date','video_release_date', 'imdb_url',
'unknown', 'Action', 'Adventure', 'Animation', 'Children\'s', 'Comedy',
'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi','Thriller', 'War', 'Western']

sep='|', names=m_cols, encoding='latin-1')

# create one merged DataFrame
movie_ratings = pd.merge(movies, ratings)
lens = pd.merge(movie_ratings, users)

# I have tried this but don't know how to get the average of the ratings for each user.
df = pd.pivot_table(lens, index = ['user_id'],
columns = ['unknown', 'Action', 'Adventure', 'Animation',
'Children\'s', 'Comedy', 'Crime', 'Documentary',
'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi',
'Thriller', 'War', 'Western'],
values = ['rating'])
print df
``````

Consider reshaping your dataframe from wide to long to create a genre column and then run result through `pivot_table()` using its `aggfunc` argument, specifically for numpy mean:

``````import pandas as pd
import numpy as np

#...same code...

lens = pd.merge(movie_ratings, users)

genrecols = ['unknown', 'Action', 'Adventure', 'Animation',
'Children\'s', 'Comedy', 'Crime', 'Documentary',
'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi',
'Thriller', 'War', 'Western']

# RESHAPE DF BY MELTING (WIDE TO LONG), SELECTING ONLY NEEDED FIELDS
mdf = pd.melt(lens[['user_id', 'sex', 'rating'] + genrecols],
id_vars=['user_id', 'sex', 'rating'], var_name='genre')

# FILTER FOR VALUE = 1 AND THREE NEEDED COLUMNS
mdf = mdf[mdf['value']==1][['user_id', 'sex', 'rating', 'genre']]

# RUN PIVOTED AGGREGATION
df = pd.pivot_table(mdf, columns = ['genre'], index = ['user_id', 'sex'],
values = ['rating'], aggfunc = np.mean)

print df
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download