jesseWUT jesseWUT - 2 months ago 28x
Python Question

Fill NaN based on MultiIndex Pandas

I have a pandas Data Frame that I would like to fill in some NaN values of.

import pandas as pd

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index)


Type Year
a 1990 NaN
1994 NaN
1996 SomeName
b 1992 NaN
1997 SomeOtherName
c 2001 SomeThirdName

The output that I would like is:

Type Year
a 1990 SomeName
1994 SomeName
1996 SomeName
b 1992 SomeOtherName
1997 SomeOtherName
c 2001 SomeThirdName

This needs to be done on a much larger DataFrame (millions of rows) where each 'Type' can have between 1-5 unique 'Years' and the name value is only present for the most recent year. I'm trying to avoid iterating over rows for performance purposes.


You can sort your data frame by index in descending order and then ffill it:

import pandas as pd
df.sort_index(level = [0,1], ascending = False).ffill()

#                           0
# Type  Year    
#    c  2001    SomeThirdName
#    b  1997    SomeOtherName
#       1992    SomeOtherName
#    a  1996    SomeName
#       1994    SomeName
#       1990    SomeName

Note: The example data doesn't really contain np.nan values but string NaN, so in order for ffill to work you need to replace the NaN string as np.nan:

import numpy as np
df[0] = np.where(df[0] == "NaN", np.nan, df[0])

Or as @ayhan suggested, after replacing the String "NaN" with np.nan use df.bfill().