piRSquared piRSquared - 3 months ago 14
Python Question

How to sum within a groupby over with both numeric and non-numeric data types

Consider the following

df


df = pd.DataFrame([
['X', 'a', 0, 1],
['X', 'b', 2, 3],
['X', 'c', 4, 5],
['Y', 'a', 6, 7],
['Y', 'b', 8, 9],
['Y', 'c', 10, 11],
], columns=['One', 'Two', 'Three', 'Four'])
df


enter image description here

df.dtypes

One object
Two object
Three int64
Four int64
dtype: object


When I
df.sum()
I get what
sum
would do over the each of the columns.

df.sum()

One XXXYYY
Two abcabc
Three 30
Four 36
dtype: object


However, I'd like to perform this within a
groupby
. I'd expect this to work

df.groupby('One').sum()


enter image description here

But it appears to only sum over numeric columns. What is a convenient way to perform the same summation as
df.sum()
?

I'd expect this result

pd.concat([df.set_index('One').loc[i].sum() for i in ['X', 'Y']],
axis=1, keys=['X', 'Y']).T.rename_axis('One')


enter image description here

Answer

It's possible to achieve your desired result by using agg with a lambda:

In [6]:
df.groupby('One').agg(lambda x: x.sum())

Out[6]:
     Two  Three  Four
One                  
X    abc      6     9
Y    abc     24    27