user3318023 user3318023 - 3 days ago 5
Python Question

GradientBoostingClassifier and many columns

I use GradientBoosting classifier to predict gender of users. The data have a lot of predictors and one of them is the country. For each country I have binary column. There are always only one column set to 1 for all country columns. But such desicion is very slow from computation point of view. Is there any way to represent country columns with only one column? I mean correct way.

Answer

You can replace the binary variable with the actual country name then collapse all of these columns into one column. Use LabelEncoder on this column to create a proper integer variable and you should be all set.

Comments