user3642173 user3642173 - 27 days ago 6
Python Question

sklearn predict_proba not macthing class labels

I have trained a RandomForestClassifier on my dataset to predict 8 different topics from a body of text. The dataset looks as follows for a given example

X_train = [[0,0,0,0,0,1,0,0,1,0],
[0,1,0,0,0,0,0,0,0,1],
[1,0,0,0,0,0,0,0,0,1]]
# This is a bag of word

y_train = ["A", "B", "C"]
# 8 categories in total


If I run the following code

rdf = RandomForestClassifier(n_estimators = 100)
rdf_fitted = rdf.fit(X_train, y_train)
print rdf_fitted.predict(x_test[0])
print rdf_fitted.predict_proba(x_test[0])
print rdf_fitted.classes_


I get a strange result

["B"]
[0.7, 0.2, 0.1]
["A","B","C"...]


Basically, the predicted label ("B" in this case) does not match the
predict_proba
predictions which suggests that "A" has the highest probability.

Any ideas what's causing this?

Answer Source

This issue was caused by a mistake I had in my Jupyter Notebook setup