Lin Ma Lin Ma - 3 months ago 21
Python Question

pandas load data with data type issues

Here is the code, output and raw csv file data, the dtypes are all object type from output, is there a way to recognize each column as string (and last column as float type)? Using Python 2.7 with miniconda.

Code,

import pandas as pd
sample=pd.read_csv('123.csv', sep=',',header=None)
print sample.dtypes


program output,

0 object
1 object
2 object
3 object


123.csv content,

c_a,c_b,c_c,c_d
hello,python,pandas,1.2


Edit 1,

sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:str})
print sample.dtypes

0 object
1 object
2 object
3 object
dtype: object


Edit 2,

sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:str})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('float32')
print sample.dtypes

c_a object
c_b object
c_c object
c_d float32


regards,
Lin

Answer

You have to use the argument dtype. And since you do not want the header, you must skip it with skiprows because the third element is not a float.

df = pd.read_csv('123.csv', header=None, skiprows=1,
            dtype={0:str, 1:str, 2:str, 3:float})

The output is:

       0       1       2    3
0  hello  python  pandas  1.2

EDIT:

To add a header with different types to your DataFrame, you can use:

df.columns = pd.Index(data=['c_a', 'c_b', 'c_d', 4.])

and the output is:

     c_a     c_b     c_d  4.0
0  hello  python  pandas  1.2