user3642173 user3642173 - 27 days ago 6
Python Question

sklearn predict_proba not macthing class labels

I have trained a RandomForestClassifier on my dataset to predict 8 different topics from a body of text. The dataset looks as follows for a given example

X_train = [[0,0,0,0,0,1,0,0,1,0],
# This is a bag of word

y_train = ["A", "B", "C"]
# 8 categories in total

If I run the following code

rdf = RandomForestClassifier(n_estimators = 100)
rdf_fitted =, y_train)
print rdf_fitted.predict(x_test[0])
print rdf_fitted.predict_proba(x_test[0])
print rdf_fitted.classes_

I get a strange result

[0.7, 0.2, 0.1]

Basically, the predicted label ("B" in this case) does not match the
predictions which suggests that "A" has the highest probability.

Any ideas what's causing this?

Answer Source

This issue was caused by a mistake I had in my Jupyter Notebook setup