RomB RomB - 3 months ago 6
Python Question

dtypes changed when initialising a new DataFrame from another one

Let's say that I have a DataFrame df1 with 2 columns:

a
with dtype
bool
and
b
with dtype
int64
. When I initialise a new DataFrame (
df1_bis
) from
df1
, columns
a
and
b
are automatically converted into objects, even if I force the dtype of
df1_bis
:

In [2]: df1 = pd.DataFrame({"a": [True], 'b': [0]})
Out[3]:
a b
0 True 0

In [4]: df1.dtypes
Out[4]:
a bool
b int64
dtype: object

In [5]: df1_bis = pd.DataFrame(df1.values, columns=df1.columns, dtype=df1.dtypes)
Out[6]:
a b
0 True 0

In [7]: df1_bis.dtypes
Out[7]:
a object
b object
dtype: object


Is there something I'm doing wrong with the
dtype
argument of DataFrame?

Answer

It is numpy that is causing the problem. pandas is inferring the types from the numpy array. If you convert to a list, you won't have the problem.

df1_bis = pd.DataFrame(df1.values.tolist(),
                       columns=df1.columns)


print(df1_bis)
print
print(df1_bis.dtypes)

      a  b
0  True  0

a     bool
b    int64
dtype: object
Comments