Mth Clv Mth Clv - 3 months ago 71
Python Question

Pandas: Multi-index apply function between column and index

I have a multi-index dataframe that look like this:

In[13]: df
Out[13]:
Last Trade
Date Ticker
1983-03-30 CLM83 1983-05-18
CLN83 1983-06-17
CLQ83 1983-07-18
CLU83 1983-08-19
CLV83 1983-09-16
CLX83 1983-10-18
CLZ83 1983-11-18
1983-04-04 CLM83 1983-05-18
CLN83 1983-06-17
CLQ83 1983-07-18
CLU83 1983-08-19
CLV83 1983-09-16
CLX83 1983-10-18
CLZ83 1983-11-18


With two levels for indexes (namely 'Date' and 'Ticker'). I would like to apply a function to the column 'Last Trade' that would let me know how many months separate this 'Last Trade' date from the index 'Date'
I found a function that does the calculation:

from calendar import monthrange

def monthdelta(d1, d2):
delta = 0
while True:
mdays = monthrange(d1.year, d1.month)[1]
d1 += datetime.timedelta(days=mdays)
if d1 <= d2:
delta += 1
else:
break
return delta


I tried to apply the following function h but it returns me an AttributeError: 'Timestamp' object has no attribute 'index':

In[14]: h = lambda x: monthdelta(x.index.get_level_values(0),x)

In[15]: df['Last Trade'] = df['Last Trade'].apply(h)


How can I apply a function that would use both a column and an index value?

Thank you for your tips,

Answer

Try this instead of your function:

Option 1

You get an integer number

def monthdelta(row):
    trade = row['Last Trade'].year*12 + row['Last Trade'].month
    date = row['Date'].year*12 + row['Date'].month
    return trade - date

df.reset_index().apply(monthdelta, axis=1)

Inspired by PiRsquared:

df = df.reset_index()
(df['Last Trade'].dt.year*12 + df['Last Trade'].dt.month) -\
(df['Date'].dt.year*12 + df['Date'].dt.month)

Option 2

You get a numpy.timedelta64

Which can be directly used for other date computations. However, this will be in the form of days, not months, because the number of days in a month are not constant.

def monthdelta(row):
    return row['Last Trade'] - row['Date']

df.reset_index().apply(monthdelta, axis=1)

Inspired by PiRsquared:

df = df.reset_index()
df['Last Trade'] - df['Date']

Option 2 will of course be faster, because it involves less computations. Pick what you like!


To get your index back: df.index = df[['Date', 'Ticker']]