DataSwede DataSwede - 5 months ago 12
Python Question

Dividing multiindex columns by sum to create percentages

I have a dataframe that is created from a pivot table, and looks similar to this:

import pandas as pd
d = {('company1', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company1', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company1', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},
('company2', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company2', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company2', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},}

df = pd.DataFrame(d)

company1 company2
FN FP TP FN FP TP
April- 2012 112 0 0 112 0 0
April- 2013 370 544 140 370 544 140
April- 2014 499 50 24 499 50 24
August- 2012 431 0 0 431 0 0
August- 2013 496 0 0 496 0 0
August- 2014 221 426 77 221 426 77


I'm looking to iterative over the upper level of the multiindex column to divide each company by it's sum to create a percentage:

company1 company2
FN FP TP FN FP TP
April- 2012 1 0 0 1 0 0
April- 2013 .35 .51 .13 .35 .51 .13
April- 2014 .87 .09 .03 .87 .09 .03
etc.


I don't know the company names beforehand. This is a variation of a question asked yesterday: Summing multiple columns with multiindex columns

Answer

You can divide by the sum using the div method (with that you can specify the level to match):

df.div(df.sum(axis=1, level=0), level=0)