crippledlambda crippledlambda - 5 months ago 63
Python Question

Assign pandas dataframe column dtypes

I want to set the

dtype
s of multiple columns in
pd.Dataframe
(I have a file that I've had to manually parse into a list of lists, as the file was not amenable for
pd.read_csv
)

import pandas as pd
print pd.DataFrame([['a','1'],['b','2']],
dtype={'x':'object','y':'int'},
columns=['x','y'])


I get

ValueError: entry not a 2- or 3- tuple


The only way I can set them is by looping through each column variable and recasting with
astype
.

dtypes = {'x':'object','y':'int'}
mydata = pd.DataFrame([['a','1'],['b','2']],
columns=['x','y'])
for c in mydata.columns:
mydata[c] = mydata[c].astype(dtypes[c])
print mydata['y'].dtype #=> int64


Is there a better way?

Answer

You can use convert_objects to infer better dtypes:

In [11]: df
Out[11]: 
   x  y
0  a  1
1  b  2

In [12]: df.dtypes
Out[12]: 
x    object
y    object
dtype: object

In [13]: df.convert_objects(convert_numeric=True)
Out[13]: 
   x  y
0  a  1
1  b  2

In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]: 
x    object
y     int64
dtype: object

Magic!