Jeff Jeff - 1 month ago 10
Python Question

Pandas Grouping - Values as Percent of Grouped Totals Not Working

Using a data frame and pandas, I am trying to figure out what each value is as a percentage of the grand total for the "group by" category

So, using the tips database, I want to see, for each sex/smoker, what the proportion of the total bill is for female smoker / all female and for female non smoker / all female (and the same thing for men)

For example,

If the complete data set is:

Sex, Smoker, Day, Time, Size, Total Bill
Female,No,Sun,Dinner,2, 20
Female,No,Mon,Dinner,2, 40
Female,No,Wed,Dinner,1, 10
Female,Yes,Wed,Dinner,1, 15


The values for the first line would be (20+40+10)/(20+40+10+15), as those are the other 3 values for non smoking females

So the output should look like

Female No 0.823529412
Female Yes 0.176470588


However, I seem to be having some trouble

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata- book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill']].apply(lambda x: x / x.sum()).head()


I get the following:

total_bill
0 0.017378
1 0.005386
2 0.010944
3 0.012335
4 0.025151


It seems to be ignoring the group by and just calculating it for each line item

I am looking for something more like

df.groupby(['sex', 'smoker'])[['total_bill']].sum()


Which will return

total_bill
sex smoker
Female No 977.68
Yes 593.27
Male No 1919.75
Yes 1337.07


But I want this expressed as percentages of totals for the total of the individual sex/smoker combinations or

Female No 977.68/(977.68+593.27)
Female Yes 593.27/(977.68+593.27)
Male No 1919.75/(1919.75+1337.07)
Male Yes 1337.07/(1919.75+1337.07)


Ideally, I would like to do the same with the "tip" column at the same time.

What am I doing wrong and how do I fix this? Thank you!

Answer

You can add another grouped by process after you get the sum table to calculate the percentage:

(df.groupby(['sex', 'smoker'])['total_bill'].sum()
   .groupby(level = 0).transform(lambda x: x/x.sum()))   # group by sex and calculate percentage

#sex     smoker
#Female  No        0.622350
#        Yes       0.377650
#Male    No        0.589455
#        Yes       0.410545
#dtype: float64