user1107049 user1107049 - 1 month ago 4x
Python Question

How to sum and to mean one DataFrame to create another DataFrame

After creating DataFrame with some duplicated cell values in the column Name:

import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
'Payment': [15, 10, 10, 10, 15],
'Duration': [30, 15, 15, 15, 20]})

enter image description here

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want
to sum the payments values John made. I proceed with:

df_sum = df.groupby('Name', axis=0).sum().reset_index()

enter image description here

But since
df.groupby('Name', axis=0).sum()
command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use
method, like so:

df_mean = df.groupby('Name', axis=0).mean().reset_index()

enter image description here

But with
function the column Payment is now showing the average payment values John made and not the sum of all the payments.

How to create a DataFrame where Duration values show the average values while the Payment values show the sum?


You can apply different functions to different columns with groupby.agg:

df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
      Payment  Duration
Alex       15        20
John       30        15
Will       15        30