jesseWUT - 2 months ago 28x

Python Question

I have a pandas Data Frame that I would like to fill in some NaN values of.

`import pandas as pd`

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]

index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])

vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']

df = pd.DataFrame(vals, index=index)

print(df)

0

Type Year

a 1990 NaN

1994 NaN

1996 SomeName

b 1992 NaN

1997 SomeOtherName

c 2001 SomeThirdName

The output that I would like is:

`Type Year`

a 1990 SomeName

1994 SomeName

1996 SomeName

b 1992 SomeOtherName

1997 SomeOtherName

c 2001 SomeThirdName

This needs to be done on a much larger DataFrame (millions of rows) where each 'Type' can have between 1-5 unique 'Years' and the name value is only present for the most recent year. I'm trying to avoid iterating over rows for performance purposes.

Answer

You can sort your data frame by index in descending order and then `ffill`

it:

```
import pandas as pd
df.sort_index(level = [0,1], ascending = False).ffill()
# 0
# Type Year
# c 2001 SomeThirdName
# b 1997 SomeOtherName
# 1992 SomeOtherName
# a 1996 SomeName
# 1994 SomeName
# 1990 SomeName
```

Note: The example data doesn't really contain `np.nan`

values but string `NaN`

, so in order for `ffill`

to work you need to replace the `NaN`

string as `np.nan`

:

```
import numpy as np
df[0] = np.where(df[0] == "NaN", np.nan, df[0])
```

Or as @ayhan suggested, after replacing the String "NaN" with `np.nan`

use `df.bfill()`

.

Source (Stackoverflow)

Comments