Chris Chris - 2 months ago 73
Python Question

Set dtypes in pandas DataFrame

I want to bring some data into a pandas DataFrame and I want to assign dtypes for each column on import. I want to be able to do this for larger datasets with many different columns, but, as an example:

myarray = np.random.randint(0,5,size=(2,2))
mydf = pd.DataFrame(myarray,columns=['a','b'], dtype=[float,int])

results in:

TypeError: data type not understood

I tried a few other methods such as:

mydf = pd.DataFrame(myarray,columns=['a','b'], dtype={'a': int})

TypeError: object of type 'type' has no len()

If I put
it applies a float format to both columns.

In the end I would like to just be able to pass it a list of datatypes the same way I can pass it a list of column names.


I just ran into this, and the pandas issue is still open, so I'm posting my workaround. Assuming df is my DataFrame and dtype is a dict mapping column names to types:

for k, v in dtype:
    df[k] = df[k].astype(v)