Anatoliy Yakimets Anatoliy Yakimets - 4 months ago 11
Python Question

Python replacing column values with 0-x indexes (for xgboost)

I'm using python (with pandas etc)
I have a data frame with label column (classes a,b,c, etc - 38 in total).
I want to use Xgboost for prediction, but it only works for lables in 0:num_classes range.

So basically i need:
- to replace all values in label column with 0:num_class index (a with 0, b with 1, c with 2 etc)

The number of classes is 38, so mapping and replacing manually is not possible.
Is there an elegant way to do this?
(In R i would use:

train_data$Class <- as.numeric(factor(train_data$Class))

But it does not work here.


You could use pandas.factorize function:

import pandas as pd
df.Class = pd.factorize(df.Class)[0]

If you want backward you could store that variable and then reassign it:

factor = pd.factorize(df.Class)
# forward
df.Class = factor[0]
# backward
df.Class = factor[1]