yas yasi yas yasi - 1 month ago 10
Python Question

Counting number of ratings without a loop python

In python, given a list of ratings as:

import pandas as pd
path = 'ratings_ml100k.csv'

data = pd.read_csv(path,sep= ',')
print(data)
user_id item_id rating
28422 100 690 4
32020 441 751 4
15819 145 265 5


where the items are:

print(itemsTrain)
[ 690 751 265 ..., 1650 1447 1507]


For each item, I would like to compute the number of ratings. Is there anyway to do this without resorting to a Loop? All ideas are appreciated,

data
is a pandas dataframe. The desire output should look like this:

pop =
item_id rating_count
690 120
751 10
265 159
... ...


Note that
itemsTrain
contain unique item_ids in the rating dataset
data
.

Answer

you can do it this way:

In [200]: df = pd.DataFrame(np.random.randint(0,8,(15,2)),columns=['id', 'rating'])

In [201]: df
Out[201]:
    id  rating
0    4       6
1    0       1
2    2       4
3    2       5
4    2       7
5    3       5
6    6       1
7    4       3
8    4       3
9    3       2
10   2       4
11   7       7
12   3       1
13   2       7
14   7       3

In [202]: df.groupby('id').rating.count()
Out[202]:
id
0    1
2    5
3    3
4    3
6    1
7    2
Name: rating, dtype: int64

you can also count # of unique ratings:

In [203]: df.groupby('id').rating.nunique()
Out[203]:
id
0    1
2    3
3    3
4    2
6    1
7    2
Name: rating, dtype: int64