user2951327 user2951327 - 2 months ago 23
Python Question

pandas pivot table aggfunc troubleshooting

This

DataFrame
has two columns, both are object type.

Dependents Married
0 0 No
1 1 Yes
2 0 Yes
3 0 Yes
4 0 No


I want to aggregate 'Dependents' based on 'Married'.

table = df.pivot_table(
values='Dependents',
index='Married',
aggfunc = lambda x: x.map({'0':0,'1':1,'2':2,'3':3}).mean())


This works, however, surprisingly, the following doesn't:

table = df.pivot_table(values = 'Dependents',
index = 'Married',
aggfunc = lambda x: x.map(int).mean())


It will produce a
None
instead.

Can anyone help explain?

Answer

Both examples of code provided in your question work. However, they are not the idiomatic way to achieve what you want to do -- particularly the first one.

I think this is the proper way to obtain the expected behavior.

# Test data
df = DataFrame({'Dependents': ['0', '1', '0', '0', '0'],
                'Married': ['No', 'Yes', 'Yes', 'Yes', 'No']})

# Converting object to int
df['Dependents'] = df['Dependents'].astype(int)
# Computing the mean by group
df.groupby('Married').mean()

         Dependents
Married            
No             0.00
Yes            0.33

However, the following code works.

df.pivot_table(values = 'Dependents', index = 'Married', 
               aggfunc = lambda x: x.map(int).mean())

It is equivalent (and more readable) of converting to int with map before pivoting data.

df['Dependents'] = df['Dependents'].map(int)
df.pivot_table(values = 'Dependents', index = 'Married')

Edit

I you have messy DataFrame, you can use to_numeric with the error parameter set to coerce.

If coerce, then invalid parsing will be set as NaN

Here is the link to the Stack Overflow documentation concerning the change of types.

# Test data
df = DataFrame({'Dependents': ['0', '1', '2', '3+', 'NaN'], 
                 'Married': ['No', 'Yes', 'Yes', 'Yes', 'No']})

df['Dependents'] = pd.to_numeric(df['Dependents'], errors='coerce')
print(df)

   Dependents Married
0         0.0      No
1         1.0     Yes
2         2.0     Yes
3         NaN     Yes
4         NaN      No

print(df.groupby('Married').mean())

         Dependents
Married            
No              0.0
Yes             1.5
Comments