Jeff - 1 month ago 6
Python Question

# Pandas Grouping - Values as Percent of Grouped Totals Based on Another Column

This question is an extension of a question I asked yesterday, but I will rephrase

Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.

So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)

When I do this,

``````import pandas as pd
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
``````

I get the following:

``````        total_bill  tip
sex smoker
Female  No  977.68  149.77
Yes 593.27  96.74
Male    No  1919.75 302.00
Yes 1337.07 183.07
``````

But I am looking for something more like this

``````        Tip Pct
Female  No  0.153189183
Yes 0.163062349
Male    No  0.15731215
Yes 0.136918785
``````

Where Tip Pct = sum(tip)/sum(total_bill) for each group

What am I doing wrong and how do I fix this? Thank you!

I understand that this would give me tip as a percentage of total tips:

``````(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))
``````

Is there a way to modify it to look at another column, i.e.

``````(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))
``````

Thanks!

You can use `apply` to loop through rows of the data frame (with `axis = 1`), where for each row you can access the `tip` and `total_bill` and divide them to get the percentage:

``````(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
.apply(lambda r: r.tip/r.total_bill, axis = 1))

#sex     smoker
#Female  No        0.153189
#        Yes       0.163062
#Male    No        0.157312
#        Yes       0.136919
#dtype: float64
``````