Zhihong Deng Zhihong Deng - 1 month ago 8
Python Question

Why data types was changed while calling the apply function in Pandas?

While using the apply function to process a DataFrame, the data type of columns was changed unexpectedly. What should I do to prevent this?

For example:

In [1]: import pandas as pd

In [2]: from pandas import DataFrame

In [3]: tmp = DataFrame({'item':[1,2,3]})

In [4]: tmp['score'] = 0.0

In [5]: tmp.dtypes
Out[5]:
item int64
score float64
dtype: object

In [6]: tmp
Out[6]:
item score
0 1 0.0
1 2 0.0
2 3 0.0

In [7]: def Test(x):
...: return x
...:

In [8]: tmp = tmp.apply(Test,axis=1)

In [9]: tmp.dtypes
Out[9]:
item float64
score float64
dtype: object


The data type of
tmp['item']
was changed into float. How to maintain the original data type of it?

Answer

This is happening because .apply essentially iterates over rows (when axis=1) and applies the function to a Series that represents each row. Since Series must contain the same data type, a Series made from a row of mixed int and float types will properly promote ints to float:

In [4]: def test(x): return x

In [5]: tmp.iloc[0]
Out[5]: 
item     1.0
score    0.0
Name: 0, dtype: float64

In [6]: tmp.apply(test, axis=1)
Out[6]: 
   item  score
0   1.0    0.0
1   2.0    0.0
2   3.0    0.0

Note what happens when we select a column, though:

In [7]: tmp.iloc[:,0]
Out[7]: 
0    1
1    2
2    3
Name: item, dtype: int64

In [8]: tmp.apply(test, axis=0)
Out[8]: 
   item  score
0     1    0.0
1     2    0.0
2     3    0.0