toasteez toasteez - 4 months ago 51
Python Question

Adding and Renaming a Column in a Multiindex DataFrame

The purpose of this post is to understand how to add a column to a level in a

MultiIndex.DataFrame
using
apply()
and
shift()


Create the DataFrame

import pandas as pd

df = pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])
).sortlevel(axis=1)

print df


I would like to use the following method to add a new column to each currency at level 2 named daily_added_value:

def do_nothing(group):
return group

def calc_daily_added_value(group):
g = (group['amount'] - group['amount'].shift(periods=1, freq=None, axis=0)
-df['injection'].shift(periods=1, freq=None, axis=0)).round(decimals=2)
g.index = ['daily_added_value']
return g

pd.concat([df.T.groupby(level=0).apply(f).T for f in [calc_daily_added_value,do_nothing ]], axis=1).sort_index(axis=1)


However this throws a key error:
KeyError: 'amount'


What is the correct syntax for the method
calc_daily_added_value()
?




Following on from the answer below there is still an issue

Adding the daily return works

dav = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = (dav.values / amount.shift()) * 100
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)


Adding the cumulative compounded returns FAILS

dr = df.loc[:, pd.IndexSlice[:, :, 'daily_return']]
drc = 100*((1+dr / 100).cumprod()-1)
drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)
df = pd.concat([df, drc], axis=1).sort_index(axis=1)
df.head()


this fails because it is missing the .values, but if I add this it becomes an array?

What is strange here though is that drc is in fact a DataFrame of correct shaped etc. and appears to contain correct results.

This fails on this line:

drc.columns.set_levels(['daily_return_cumulative'], level=2, inplace=True)


Error is
ValueError: On level 2, label max (2) >= length of level (1). NOTE: this index is in an inconsistent state


How can the index be placed back into a consistent state?

Answer

Skip the groupby it is not necessary

amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
#dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)

pd.concat([df, dav], axis=1).sort_index(axis=1).T

Note: I used T to get a picture that would easily fit

enter image description here

there appears to be a bug in set_levels and as such it is not advised to use it.

Workaround to rename the MultiIndex Column in the DataFrame dav

def map_level(df, dct, level=2):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else    names
                       for i, names in enumerate(index.levels)], inplace=True)
dct = {'amount':'daily_added_value'}
map_level(dav.T, dct, level=2)