user2314737 user2314737 - 2 months ago 10
Python Question

Setting elements to None in pandas dataframe

I'm not sure why this happens

>>> df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))
>>> df
A B C
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
4 12 13 14


Assign
None
to elements in last row turns it into
NaN NaN NaN
:

>>> df.ix[5,:] = None
>>> df
A B C
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
4 12 13 14
5 NaN NaN NaN


Change two element in last column to 'nan'

>>> df.ix[:1,2] = 'nan'
>>> df
A B C
0 0 1 nan
1 3 4 nan
2 6 7 8
3 9 10 11
4 12 13 14
5 NaN NaN NaN


Now last row becomes
NaN NaN None


>>> df.ix[5,:] = None
>>> df
A B C
0 0 1 nan
1 3 4 nan
2 6 7 8
3 9 10 11
4 12 13 14
5 NaN NaN None

Answer

It's because your dtypes are being changed after each assignment:

In [7]: df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))

In [8]: df.dtypes
Out[8]:
A    int32
B    int32
C    int32
dtype: object

In [9]: df.ix[5,:] = None

In [10]: df.dtypes
Out[10]:
A    float64
B    float64
C    float64
dtype: object

In [11]: df.ix[:1,2] = 'nan'

after that last assignment the C column has been implicitly converted to object (string) dtype:

In [12]: df.dtypes
Out[12]:
A    float64
B    float64
C     object
dtype: object

@ayhan has written very neat answer as a comment:

I think the main reason is for numerical columns, when you insert None or np.nan, it is converted to np.nan to have a Series of type float. For objects, it takes whatever is passed (if None, it uses None; if np.nan, it uses np.nan - docs)

(c) ayhan

Here is a corresponding demo:

In [39]: df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))

In [40]: df.ix[4, 'A'] = None

In [41]: df.ix[4, 'C'] = np.nan

In [42]: df
Out[42]:
     A   B     C
0  0.0   1   2.0
1  3.0   4   5.0
2  6.0   7   8.0
3  9.0  10  11.0
4  NaN  13   NaN

In [43]: df.dtypes
Out[43]:
A    float64
B      int32
C    float64
dtype: object

In [44]: df.ix[0, 'C'] = 'a string'

In [45]: df
Out[45]:
     A   B         C
0  0.0   1  a string
1  3.0   4         5
2  6.0   7         8
3  9.0  10        11
4  NaN  13       NaN

In [46]: df.dtypes
Out[46]:
A    float64
B      int32
C     object
dtype: object

now we can use both None and np.nan for the object dtype:

In [47]: df.ix[1, 'C'] = None

In [48]: df.ix[2, 'C'] = np.nan

In [49]: df
Out[49]:
     A   B         C
0  0.0   1  a string
1  3.0   4      None
2  6.0   7       NaN
3  9.0  10        11
4  NaN  13       NaN