ejkt - 4 months ago 53

Python Question

Is there a way in Pandas to count the number of rows containing a specific value based on a group including those groups containing no value?

For instance if I have this dataframe:

`dd = pd.DataFrame({'g1':['a','b','a','b','a','b','c','c'],\`

'g2':['x','x','z','y','y','z','x','z'],\

'cond':['i','i','i','j','j','j','k','k']})

dd[['g1','g2','cond']]

g1 g2 cond

0 a x i

1 b x i

2 a z i

3 b y j

4 a y j

5 b z j

6 c x k

7 c z k

I would like to get all of the counts of j by group, including if the group does not have any rows containing 'j'. The ideal output would look like this:

`a x 0`

y 1

z 0

b x 0

y 1

z 1

c x 0

z 0

If I try and take the conditional slice beforehand, I get a truncated answer which I do not want.

`dd.ix[dd['cond']=='j',:].groupby(['g1','g2'])['cond'].count()`

g1 g2

a y 1

b y 1

z 1

Additionally, I do not want a pivot table since the data frame does not contain every combination of g1 and g2.

Thanks in advance.

Answer

Just `apply`

a function that counts the js.

```
>>> dd.groupby(['g1', 'g2']).cond.apply(lambda g: (g=='j').sum())
g1 g2
a x 0
y 1
z 0
b x 0
y 1
z 1
c x 0
z 0
Name: cond, dtype: int64
```