Jivan Jivan - 26 days ago 6
Python Question

Pandas - get a summarized pivot DataFrame from row counts

Given this DataFrame:

bowl cookie
0 one chocolate
1 two chocolate
2 two chocolate
3 two vanilla
4 one vanilla
5 one vanilla
6 one vanilla
7 one vanilla
8 one vanilla
9 two chocolate


I'd like to obtain the following summarized DataFrame:

vanilla chocolate
one 5 1
two 1 3


Apart from proceeding manually with:

vanilla_bowl1 = len(df_picks[(df_picks['bowl'] == 'one') & (df_picks['cookie'] == 'vanilla')])
vanilla_bowl2 = len(df_picks[(df_picks['bowl'] == 'two') & (df_picks['cookie'] == 'vanilla')])
chocolate_bowl1 = ...
chocolate_bowl2 = ...


Is there a way to do that in a single operation with
Pandas
?




Note: I've had a look at
df.pivot()
and this would work provided that I add a column
count
equal to
1
in each row:

bowl cookie count
0 one chocolate 1
1 two chocolate 1
2 two chocolate 1
3 two vanilla 1
4 one vanilla 1
5 one vanilla 1
6 one vanilla 1
7 one vanilla 1
8 one vanilla 1
9 two chocolate 1


And then

df.pivot(index='bowl', columns='cookie', values='count')


However, I'm wondering if there is a more direct method, that wouldn't require adding the
count
column in the first place.

Answer

The most concise way is probably the pandas.crosstab function:

>>> pandas.crosstab(d.bowl, d.cookie)
cookie  chocolate  vanilla
bowl                      
one             1        5
two             3        1