Jeff Jeff - 1 month ago 6
Python Question

Pandas Grouping - Values as Percent of Grouped Totals Based on Another Column

This question is an extension of a question I asked yesterday, but I will rephrase

Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.

So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()


I get the following:

total_bill tip
sex smoker
Female No 977.68 149.77
Yes 593.27 96.74
Male No 1919.75 302.00
Yes 1337.07 183.07


But I am looking for something more like this

Tip Pct
Female No 0.153189183
Yes 0.163062349
Male No 0.15731215
Yes 0.136918785


Where Tip Pct = sum(tip)/sum(total_bill) for each group

What am I doing wrong and how do I fix this? Thank you!

I understand that this would give me tip as a percentage of total tips:

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))


Is there a way to modify it to look at another column, i.e.

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))


Thanks!

Answer

You can use apply to loop through rows of the data frame (with axis = 1), where for each row you can access the tip and total_bill and divide them to get the percentage:

(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
   .apply(lambda r: r.tip/r.total_bill, axis = 1))

#sex     smoker
#Female  No        0.153189
#        Yes       0.163062
#Male    No        0.157312
#        Yes       0.136919
#dtype: float64
Comments