Pro Pro - 25 days ago 5
Python Question

Include and exlude in pandas (python)

The code count the rating of all the users (user_id) I want to count the ratings from only every user with ALLAN as sName .
The main code

grouped_data = ratings['rating'].groupby(ratings['movie_id'])
average_ratings = grouped_data.mean()
movie_count = ratings.movie_id.value_counts()
higher_than_50_votes = movie_count.index[movie_count > 50]
average_ratings.ix[higher_than_50_votes].sort_values(ascending=False).head(5)


A part of the first table.

user_id movie_id rating
196 242 3
22 302 3
90 377 1
10 51 2
2 346 1
1 474 4
8 265 2
4 465 5
2 451 3
1 451 5


A part of the second table.

user_id Sname
1| AKERS
2| other
3| ALEXANDER
4| ALBERT
5| ALBERT
6| ANSEL
7| ALLARD
8| ALLAN
9| ALLAN

Answer

Several ways to do this:

1.Merge the Sname columns in the ratings dataframe, on "user_id"

ratings_with_names = ratings.merge(names, on='user_id')

This gives you something like:

    user_id     movie_id    rating  unix_timestamp  Sname
0   6           86          3       883603013       ANSEL
1   6           14          5       883599249       ANSEL
2   6           98          5       883600680       ANSEL
3   6           463         4       883601713       ANSEL 

So now it's easy to only select the rows you need with boolean indexing

ratings_with_names[ratings_with_names.Sname == 'ALLAN']

2.Find the user_id that match a condition in the second dataframe, and use that to filter on the first dataframe:

ratings[ratings.user_id.isin(names.ix[names.Sname == 'ALLAN', 'user_id'])]
Comments