ejkt ejkt - 4 months ago 53
Python Question

Pandas Compute conditional count for groupby including zero counts

Is there a way in Pandas to count the number of rows containing a specific value based on a group including those groups containing no value?

For instance if I have this dataframe:

dd = pd.DataFrame({'g1':['a','b','a','b','a','b','c','c'],\
'g2':['x','x','z','y','y','z','x','z'],\
'cond':['i','i','i','j','j','j','k','k']})
dd[['g1','g2','cond']]

g1 g2 cond
0 a x i
1 b x i
2 a z i
3 b y j
4 a y j
5 b z j
6 c x k
7 c z k


I would like to get all of the counts of j by group, including if the group does not have any rows containing 'j'. The ideal output would look like this:

a x 0
y 1
z 0
b x 0
y 1
z 1
c x 0
z 0


If I try and take the conditional slice beforehand, I get a truncated answer which I do not want.

dd.ix[dd['cond']=='j',:].groupby(['g1','g2'])['cond'].count()
g1 g2
a y 1
b y 1
z 1


Additionally, I do not want a pivot table since the data frame does not contain every combination of g1 and g2.

Thanks in advance.

Answer

Just apply a function that counts the js.

>>> dd.groupby(['g1', 'g2']).cond.apply(lambda g: (g=='j').sum())
g1  g2
a   x     0
    y     1
    z     0
b   x     0
    y     1
    z     1
c   x     0
    z     0
Name: cond, dtype: int64