Sachin_ruk Sachin_ruk - 8 months ago 106
Python Question

Pandas: convert categories to numbers

Suppose I have a dataframe with countries that goes as:

cc | temp
US | 37.0
CA | 12.0
US | 35.0
AU | 20.0

I know that there is a pd.get_dummies function to convert the countries to 'one-hot encodings'. However, I wish to convert them to indices instead such that I will get
cc_index = [1,2,1,3]

I'm assuming that there is a faster way than using the get_dummies along with a numpy where clause as shown below:

[np.where(x) for x in]

This is somewhat easier to do in R using 'factors' so I'm hoping pandas has something similar.


First, change the type of the column: = pd.Categorical(

Now the data look similar but are stored categorically. To capture the category codes:

df['code'] =

Now you have:

   cc  temp  code
0  US  37.0     2
1  CA  12.0     1
2  US  35.0     2
3  AU  20.0     0

If you don't want to modify your DataFrame but simply get the codes:'category')

Or use the categorical column as an index:

df2 = pd.DataFrame(df.temp)
df2.index = pd.CategoricalIndex(