I'm using python (with pandas etc)
I have a data frame with label column (classes a,b,c, etc - 38 in total).
I want to use Xgboost for prediction, but it only works for lables in 0:num_classes range.
So basically i need:
- to replace all values in label column with 0:num_class index (a with 0, b with 1, c with 2 etc)
The number of classes is 38, so mapping and replacing manually is not possible.
Is there an elegant way to do this?
(In R i would use:
train_data$Class <- as.numeric(factor(train_data$Class))
You could use pandas.factorize function:
import pandas as pd df.Class = pd.factorize(df.Class)
If you want backward you could store that variable and then reassign it:
factor = pd.factorize(df.Class) # forward df.Class = factor # backward df.Class = factor