gincard gincard - 1 year ago 215
Python Question

Python Pandas - Changing some column types to categories

I have fed the following CSV file into iPython Notebook:

public = pd.read_csv("categories.csv")

I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)

In [36]: public.dtypes
Out[37]: parks object
playgrounds object
sports object
roading object
resident int64
children int64

I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.

I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:

public1 = {'parks': public.parks}
public1 = public1['parks'].astype('category')

However, when I tried to change a number at once using this code, I was unsuccessful:

public1 = {'parks': public.parks,
'playgrounds': public.parks}
public1 = public1['parks', 'playgrounds'].astype('category')

Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.

I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...

public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')

and got the following error:

NotImplementedError: > 1 ndim Categorical are not supported at this time

Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most slowly going bald ripping my hair out!

Many thanks in advance.

edited to add - I am using Python 2.7.

Answer Source

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download