toasteez toasteez - 1 year ago 135
Python Question

Adding and Renaming a Column in a Multiindex DataFrame

The purpose of this post is to understand how to add a column to a level in a


Create the DataFrame

import pandas as pd

df = pd.DataFrame(
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
[('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])

print df

I would like to use the following method to add a new column to each currency at level 2 named daily_added_value:

def do_nothing(group):
return group

def calc_daily_added_value(group):
g = (group['amount'] - group['amount'].shift(periods=1, freq=None, axis=0)
-df['injection'].shift(periods=1, freq=None, axis=0)).round(decimals=2)
g.index = ['daily_added_value']
return g

pd.concat([df.T.groupby(level=0).apply(f).T for f in [calc_daily_added_value,do_nothing ]], axis=1).sort_index(axis=1)

However this throws a key error:
KeyError: 'amount'

What is the correct syntax for the method

Following on from the answer below there is still an issue

Adding the daily return works

dav = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = (dav.values / amount.shift()) * 100
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)

Adding the cumulative compounded returns FAILS

dr = df.loc[:, pd.IndexSlice[:, :, 'daily_return']]
drc = 100*((1+dr / 100).cumprod()-1)
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
df = pd.concat([df, drc], axis=1).sort_index(axis=1)

this fails because it is missing the .values, but if I add this it becomes an array?

What is strange here though is that drc is in fact a DataFrame of correct shaped etc. and appears to contain correct results.

This fails on this line:

drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)

Error is
ValueError: On level 2, label max (2) >= length of level (1). NOTE: this index is in an inconsistent state

How can the index be placed back into a consistent state?

Answer Source

Skip the groupby it is not necessary

amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
#dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)

pd.concat([df, dav], axis=1).sort_index(axis=1).T

Note: I used T to get a picture that would easily fit

enter image description here

there appears to be a bug in set_levels and as such it is not advised to use it.

Workaround to rename the MultiIndex Column in the DataFrame dav

def map_level(df, dct, level=2):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else    names
                       for i, names in enumerate(index.levels)], inplace=True)
dct = {'amount':'daily_added_value'}
map_level(dav.T, dct, level=2)