user12202013 user12202013 - 1 month ago 22
Python Question

Easy way to apply transformation from `pandas.get_dummies` to new data?

Suppose I have a data frame

data
with strings that I want converted to indicators. I use
pandas.get_dummies(data)
to convert this to a dataset that I can now use for building a model.

Now I have a single new observation that I want to run through my model. Obviously I can't use
pandas.get_dummies(new_data)
because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?

JAB JAB
Answer

you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:

import pandas as pd
df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]})
df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]}))
dummies_frame = pd.get_dummies(df)
df1.reindex(columns = dummies_frame.columns, fill_value=0)

returns:

        val     cat_a   cat_b   cat_c   cat_d
  0     1       1       0       0       0
Comments