user1107049 user1107049 - 2 months ago 7
Python Question

How to sum and to mean one DataFrame to create another DataFrame

After creating DataFrame with some duplicated cell values in the column Name:

import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
'Payment': [15, 10, 10, 10, 15],
'Duration': [30, 15, 15, 15, 20]})


enter image description here

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want
to sum the payments values John made. I proceed with:

df_sum = df.groupby('Name', axis=0).sum().reset_index()


enter image description here

But since
df.groupby('Name', axis=0).sum()
command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use
mean()
method, like so:

df_mean = df.groupby('Name', axis=0).mean().reset_index()


enter image description here

But with
mean()
function the column Payment is now showing the average payment values John made and not the sum of all the payments.

How to create a DataFrame where Duration values show the average values while the Payment values show the sum?

Answer

You can apply different functions to different columns with groupby.agg:

df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
Out: 
      Payment  Duration
Name                   
Alex       15        20
John       30        15
Will       15        30
Comments