user1074057 user1074057 - 2 months ago 15
Python Question

Creating dummy variables in pandas for python

I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies function, but whenever I try to call it I receive an error that the name is not defined.

Any thoughts or other ways to create the dummy variables would be appreciated.

EDIT: Since others seem to be coming across this, the

get_dummies
function in pandas now works perfectly fine. This means the following should work:

import pandas as pd

dummies = pd.get_dummies(df['Category'])


See http://blog.yhathq.com/posts/logistic-regression-and-python.html for further information.

Answer

It's hard to infer what you're looking for from the question, but my best guess is as follows.

If we assume you have a DataFrame where some column is 'Category' and contains integers (or otherwise unique identifiers) for categories, then we can do the following.

Call the DataFrame dfrm, and assume that for each row, dfrm['Category'] is some value in the set of integers from 1 to N. Then,

for elem in dfrm['Category'].unique():
    dfrm[str(elem)] = dfrm['Category'] == elem

Now there will be a new indicator column for each category that is True/False depending on whether the data in that row are in that category.

If you want to control the category names, you could make a dictionary, such as

cat_names = {1:'Some_Treatment', 2:'Full_Treatment', 3:'Control'}
for elem in dfrm['Category'].unique():
    dfrm[cat_names[elem]] = dfrm['Category'] == elem

to result in having columns with specified names, rather than just string conversion of the category values. In fact, for some types, str() may not produce anything useful for you.