Ariff Yasri Ariff Yasri - 2 months ago 19
Python Question

Set limit feature_importances_ in DataFrame Pandas

I want to set a limit for my feature_importances_ output using DataFrame.
Below is my code (refer from this blog):

train = df_visualization.sample(frac=0.9,random_state=639)
test = df_visualization.drop(train.index)

train.to_csv('train.csv',encoding='utf-8')
test.to_csv('test.csv',encoding='utf-8')

train_dis = train.iloc[:,:66]
train_val = train_dis.values
train_in = train_val[:,:65]
train_out = train_val[:,65]

test_dis = test.iloc[:,:66]
test_val = test_dis.values
test_in = test_val[:,:65]
test_out = test_val[:,65]

dt = tree.DecisionTreeClassifier(random_state=59,criterion='entropy')
dt = dt.fit(train_in,train_out)

score = dt.score(train_in,train_out)
test_predicted = dt.predict(test_in)

# Print the feature ranking
print("Feature ranking:")

print (DataFrame(dt.feature_importances_, columns = ["Imp"], index = train.iloc[:,:65].columns).sort_values(['Imp'], ascending = False))


My problem now is it display all 65 features.
Output :

Imp
wbc 0.227780
age 0.100949
gcs 0.069359
hr 0.069270
rbs 0.053418
sbp 0.052067
Intubation-No 0.050729
... ...
Babinski-Normal 0.000000
ABG-Metabolic Alkolosis 0.000000
ABG-Respiratory Acidosis 0.000000
Reflexes-Unilateral Hyperreflexia 0.000000
NS-No 0.000000


For example I just want top 5 features only.
Expected output:

Imp
wbc 0.227780
age 0.100949
gcs 0.069359
hr 0.069270
rbs 0.053418


Update :
I got the way to display using itertuples.

display = pd.DataFrame(dt.feature_importances_, columns = ["Imp"], index = train.iloc[:,:65].columns).sort_values(['Imp'], ascending = False)
x=0
for row,col in display.itertuples():
if x<5:
print(row,"=",col)
else:
break
x++


Output :

Feature ranking:
wbc = 0.227780409582
age = 0.100949241154
gcs = 0.0693593476192
hr = 0.069270425399
rbs = 0.0534175402602


But I want to know whether this is the efficient way to get the output?

Answer

Try this:

indices = np.argsort(dt.feature_importances_)[::-1]
for i in range(5): 
       print " %s = %s" % (feature_cols[indices[i]], dt.feature_importances_[indices[i]])