ccsv - 1 year ago 69

Python Question

In

`pandas`

`pt=df.groupby(['Group']).sum().reset_index()`

Lets say I want to apply a lambda function

`lambda x: (0 < x).sum()`

`import pandas as pd`

df=pd.DataFrame({'Group':['W', 'W', 'W', 'E','E','E','N'],

'A':[0,1,5,0,1,5,7],

'B':[1,0,5,0,0,2,0],

'C':[1,1,5,0,0,5,0],

'Total':[2,2,15,0,1,12,7]

})

#Check how many items are present in Group

grp=df.groupby(['Group'])

pt1 = grp['A'].apply(lambda x: (0 < x).sum()).reset_index()

pt2 = grp['B'].apply(lambda x: (0 < x).sum()).reset_index()

pt3 = grp['C'].apply(lambda x: (0 < x).sum()).reset_index()

pct=pd.merge(pt1, pt2, on=['Group'])

pct=pd.merge(pt2, pct, on=['Group'])

#Get total items and merge with counts

pt = df.groupby(['Group'])['Total'].count().reset_index()

pct=pd.merge(pt, pct, on=['Group'])

Output:

`Group Total C A B`

0 E 3 1 2 1

1 N 1 0 1 0

2 W 3 3 2 2

What is a efficient way to write it for n columns?

Answer Source

The cleanest way I can think of is this:

```
(df > 0).groupby(df['Group']).agg({'A': 'sum', 'B': 'sum', 'C': 'sum', 'Total': 'count'})
Out:
C Total B A
Group
E 1.0 3 1.0 2.0
N 0.0 1 0.0 1.0
W 3.0 3 2.0 2.0
```

You can sort and cast to int if you want:

```
((df > 0).groupby(df['Group']).agg({'A': 'sum', 'B': 'sum', 'C': 'sum', 'Total': 'count'})
.sort_index(axis=1).astype('int')
Out:
A B C Total
Group
E 2 1 1 3
N 1 0 0 1
W 2 2 3 3
```