max max - 6 months ago 20
Python Question

Conditional counting within groups

I wanted to do conditional counting after

groupby
; for example, group by values of column
A
, and then count within each group how often value
5
appears in column
B
.

If I was doing this for the entire
DataFrame
, it's just
len(df[df['B']==5])
. So I hoped I could do
df.groupby('A')[df['B']==5].size()
. But I guess boolean indexing doesn't work within
GroupBy
objects.

Example:

import pandas as pd
df = pd.DataFrame({'A': [0, 4, 0, 4, 4, 6], 'B': [5, 10, 10, 5, 5, 10]})
groups = df.groupby('A')
# some more code
# in the end, I want to get pd.Series({0: 1, 1: 2, 6: 0})

Answer

Select all rows where B equals 5, and then apply groupby/size:

In [43]: df.loc[df['B']==5].groupby('A').size()
Out[43]: 
A
0    1
4    2
dtype: int64

Alternatively, you could use groupby/agg with a custom function:

In [44]: df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
Out[44]: 
A
0    1
4    2
Name: B, dtype: int64

Note that generally speaking, using agg with a custom function will be slower than using groupby with a builtin method such as size. So prefer the first option over the second.

In [45]: %timeit df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
1000 loops, best of 3: 927 µs per loop

In [46]: %timeit df.loc[df['B']==5].groupby('A').size()
1000 loops, best of 3: 649 µs per loop
Comments