hoof_hearted hoof_hearted - 3 months ago 20
Python Question

Pandas - Multiindex Division [i.e. Division by Group]

Aim: I'm trying to divide each row in a multilevel index by the total number in each group.

More specifically: Given the following data, I want to divide the number of Red and Blue marbles by the total number in each group (i.e. the sum across Date, Country and Colour)

Number
Date Country Colour
2011 US Red 4
Blue 6
2012 IN Red 9
IE Red 5
Blue 5
2013 JP Red 15
Blue 25


This would give the following answer:

Number
Date Country Colour
2011 US Red 0.4
Blue 0.6
2012 IN Red 1.0
IE Red 0.5
Blue 0.5
2013 JP Red 0.375
Blue 0.625


Here is the code to reproduce the data:

arrays = [np.array(['2011', '2011', '2012', '2012', '2012', '2013', '2013']),
np.array(['US', 'US', 'IN', 'IE', 'IE', 'JP', 'JP', 'GB']),
np.array(['Red', 'Blue', 'Red', 'Red', 'Blue', 'Red', 'Blue', 'Blue'])]

df = pd.DataFrame(np.random.rand(7, 1)*10, index=arrays, columns=['number'])
df.index.names = ['Date', 'Country', 'Colour']

Answer

A shorter version would be:

df.groupby(level=['Date', 'Country']).transform(lambda x: x/x.sum())

                     number
Date Country Colour        
2011 US      Red      0.400
             Blue     0.600
2012 IN      Red      1.000
     IE      Red      0.500
             Blue     0.500
2013 JP      Red      0.375
             Blue     0.625