ozhogin ozhogin - 24 days ago 9
Python Question

Pandas replace with string and integers - incorrect behavior?

I encountered a potentially incorrect behavior of pandas replace with strings and integers. If the dataframe has both 0 (integer) and '0' (strings) then replace '0' affects both strings and integers.
Here's how it goes:

In [1]: df = pd.DataFrame({'numbers' : [0, 1, 2, 0], 'strings' : ['0', 1, 2, '0']})


To check that it's indeed the correct setup:

In [2]: df.dtypes
Out [2]:
numbers int64
strings object
dtype: object


And check individual values:

In [3]: type(df['numbers'][0])
Out[3]: numpy.int64
In [4]: type(df['strings'][0])
Out[4]: str


Now, do replace:

In [5]: df.replace(to_replace='0', value=np.NaN, inplace=True)
In [6]: df.head()
Out[6]:
numbers strings
0 NaN NaN
1 1 1
2 2 2
3 NaN NaN


As you can see, it replaced both strings and integers, however should have worked only on the strings. If we try same on integers, it works correctly:

In [7]: df = pd.DataFrame({'numbers' : [0, 1, 2, 0], 'strings' : ['0', 1, 2, '0']})
...: df.replace(to_replace=0, value=np.NaN, inplace=True)
...: print df.head()
Out [7]:
numbers strings
0 NaN 0
1 1 1
2 2 2
3 NaN 0


Is this the correct behavior or I should report a bug? I'm using pandas 0.19.0.

Thank you!

Update: Bug reported and confirmed. @nickil-maveli provided ta workaround that works in the meantime:
df.replace(to_replace=['0'], value=[np.NaN], inplace=True)

Answer

Bug reported and confirmed by developers. @nickil-maveli provided a workaround that works in the meantime: df.replace(to_replace=['0'], value=[np.NaN], inplace=True)

Comments