crippledlambda - 1 year ago 248

Python Question

I want to set the

`dtype`

`pd.Dataframe`

`pd.read_csv`

`import pandas as pd`

print pd.DataFrame([['a','1'],['b','2']],

dtype={'x':'object','y':'int'},

columns=['x','y'])

I get

`ValueError: entry not a 2- or 3- tuple`

The only way I can set them is by looping through each column variable and recasting with

`astype`

`dtypes = {'x':'object','y':'int'}`

mydata = pd.DataFrame([['a','1'],['b','2']],

columns=['x','y'])

for c in mydata.columns:

mydata[c] = mydata[c].astype(dtypes[c])

print mydata['y'].dtype #=> int64

Is there a better way?

Answer Source

You can use `convert_objects`

to infer better dtypes:

```
In [11]: df
Out[11]:
x y
0 a 1
1 b 2
In [12]: df.dtypes
Out[12]:
x object
y object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
x y
0 a 1
1 b 2
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
x object
y int64
dtype: object
```

*Magic!*