splinter splinter - 1 month ago 9
Python Question

Looking for a pandas function analogous to DataFrame.nafill()

I would like to apply a function that acts like

fillna()
but takes a different value than nan. Unfortunately
DataFrame.replace()
will not work in my case. Here is an example: Given a DataFrame:

df = pd.DataFrame([[1,2,3],[4,-1,-1],[5,6,-1]])

0 1 2
0 1 2.0 3.0
1 4 -1.0 -1.0
2 5 6.0 -1.0
3 7 8.0 NaN


I am looking for a function which will output:

0 1 2
0 1 2.0 3.0
1 4 2.0 3.0
2 5 6.0 3.0
3 7 8.0 NaN


So
df.replace()
with
to_replace=-1
and
'method='ffill'
will not work because it requires a column-independent
value
which will replace the -1 entries. In my example it is column-dependent. I know I can code it with a loop but am looking for an efficient code as it will be applied to a large DataFrame. Any suggestions? Thank you.

Answer

You can just replace the value with NaN and then call ffill:

In [3]:

df.replace(-1, np.NaN).ffill()
Out[3]:
   0  1  2
0  1  2  3
1  4  2  3
2  5  6  3

I think you're over thinking this

EDIT

If you already have NaN values then create a boolean mask and update just those elements again with ffill on the inverse of the mask:

In [15]:    
df[df == -1] = df[df != -1].ffill()
df

Out[15]:
   0  1   2
0  1  2   3
1  4  2   3
2  5  6   3
3  7  8 NaN

Another method (thanks to @DSM in comments) is to use where to essentially do the same thing as above:

In [17]:
df.where(df != -1, df.replace(-1, np.nan).ffill())

Out[17]:
   0  1   2
0  1  2   3
1  4  2   3
2  5  6   3
3  7  8 NaN