Minh Mai Minh Mai - 1 year ago 765
Python Question

Feature Importance with XGBClassifier

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using

much like sklearn's random forest.

However, for some reason, I keep getting this error:
AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)

It seems that you can compute feature importance using the
object by calling the
attribute. The only reason I'm using
is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

Answer Source

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}

>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}
