piRSquared - 1 year ago 51

Python Question

Consider the following

`df`

`df = pd.DataFrame([`

['X', 'a', 0, 1],

['X', 'b', 2, 3],

['X', 'c', 4, 5],

['Y', 'a', 6, 7],

['Y', 'b', 8, 9],

['Y', 'c', 10, 11],

], columns=['One', 'Two', 'Three', 'Four'])

df

`df.dtypes`

One object

Two object

Three int64

Four int64

dtype: object

When I

`df.sum()`

`sum`

`df.sum()`

One XXXYYY

Two abcabc

Three 30

Four 36

dtype: object

However, I'd like to perform this within a

`groupby`

`df.groupby('One').sum()`

But it appears to only sum over numeric columns. What is a convenient way to perform the same summation as

`df.sum()`

I'd expect this result

`pd.concat([df.set_index('One').loc[i].sum() for i in ['X', 'Y']],`

axis=1, keys=['X', 'Y']).T.rename_axis('One')

Answer Source

It's possible to achieve your desired result by using `agg`

with a `lambda`

:

```
In [6]:
df.groupby('One').agg(lambda x: x.sum())
Out[6]:
Two Three Four
One
X abc 6 9
Y abc 24 27
```