Tim Finkel Tim Finkel - 21 days ago 8
Python Question

Unexpected pandas.Series.replace() behavior

Given this -

import pandas as pd


s = pd.Series(['', '1', '2', '', '4', '', '6'])


Why does this -

s.replace('', None).values


Result in this -

array(['', '1', '2', '2', '4', '4', '6'], dtype=object)


When I would expect this -

array([None, '1', '2', None, '4', None, '6'], dtype=object)

Answer

The use of None is problematic there. If you pass None for an argument, it will use the default value for that (docs):

None

The sole value of types.NoneType. None is frequently used to represent the absence of a value, as when default arguments are not passed to a function.

So s.replace('', None) is the same as s.replace(''). Apparently the default action when no value is passed is to forward fill the Series. Instead, you can use np.nan:

pd.Series(['', '1', '2', '', '4', '', '6']).replace('', np.nan)
Out: 
0    NaN
1      1
2      2
3    NaN
4      4
5    NaN
6      6
dtype: object

Or pass a dict:

s.replace({'': None})
Out: 
0    None
1       1
2       2
3    None
4       4
5    None
6       6
dtype: object