sanaz - 2 months ago 4x

Python Question

I have the following data frame:

`data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})`

product_id user_id

p1 a1

p1 a1

p2 a1

p1 a2

p1 a2

p1 a2

p2 a3

p2 a3

p3 a3

in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

output should be something like this:

`user_id product_id count`

a1 p1 2

a1 p2 1

a2 p1 3

a3 p2 2

a3 p3 1

I have tried the following code:

`grouped=data.groupby(['user_id','product_id']).count()`

but the result is:

`user_id product_id`

a1 p1

p2

a2 p1

a3 p2

p3

actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

Answer

Maybe this is what you want?

```
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
```

Source (Stackoverflow)

Comments