jesseWUT jesseWUT - 3 months ago 10
Python Question

Ascending/Resetting Integer Values from MultiIndex Pandas

I have a dataframe:

import pandas as pd

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['This','That','SomeName','This','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index, columns=['Whatev'])


df
Out[3]:
Whatev
Type Year
a 1990 This
1994 That
1996 SomeName
b 1992 This
1997 SomeOtherName
c 2001 SomeThirdName


And I'd like to add a column of ascending integers corresponding to 'Year' that resets for each 'Type', like so:

Whatev IndexInt
Type Year
a 1990 This 1
1994 That 2
1996 SomeName 3
b 1992 This 1
1997 SomeOtherName 2
c 2001 SomeThirdName 1


Here's my current method:

grouped = df.groupby(level=0)
unique_loc = []
for name, group in grouped:
unique_loc += range(1,len(group)+1)
joined['IndexInt'] = unique_loc


But this seems ugly and convoluted to me, and I imagine it could get slow on the ~50 million row dataframe that I'm working with. Is there a simpler way?

Answer

you can use groupby(level=0) + cumcount():

In [7]: df['IndexInt'] = df.groupby(level=0).cumcount()+1

In [8]: df
Out[8]:
                  Whatev  IndexInt
Type Year
a    1990           This         1
     1994           That         2
     1996       SomeName         3
b    1992           This         1
     1997  SomeOtherName         2
c    2001  SomeThirdName         1
Comments