Talka - 1 month ago 10x
Python Question

# Confusion matrix raws are mismatched

I've created a confusion matrix that works all right but its raws don't seem to be connected with the labels as should be.

I have some list of strings which is splitted into train and test sections:

`````` train + test:
positive: 16 + 4 = 20
negprivate:  53 + 14 = 67
negstratified: 893 + 224 = 1117
``````

The Confusion matrix is built on the test data:

`````` [[  0  14   0]
[  3 220   1]
[  0   4   0]]
``````

Here is the code:

``````my_tags = ['negprivate', 'negstratified', 'positive']

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
logging.info('plot_confusion_matrix')
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(my_tags))
target_names = my_tags
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

def evaluate_prediction(target, predictions, taglist, title="Confusion matrix"):
logging.info('Evaluate prediction')
print('accuracy %s' % accuracy_score(target, predictions))
cm = confusion_matrix(target, predictions)
print('confusion matrix\n %s' % cm)
print('(row=expected, col=predicted)')
print 'rows: \n %s \n %s \n %s ' % (taglist[0], taglist[1], taglist[2])

cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plot_confusion_matrix(cm_normalized, title + ' Normalized')
``````

...

``````test_targets, test_regressors = zip(
*[(doc.tags[0], doc2vec_model.infer_vector(doc.words, steps=20)) for doc in alltest])
logreg = linear_model.LogisticRegression(n_jobs=1, C=1e5)
logreg = logreg.fit(train_regressors, train_targets)
evaluate_prediction(test_targets, logreg.predict(test_regressors), my_tags, title=str(doc2vec_model))
``````

But the point is that I actually have to look at the numbers in the resulting matrix and to change the order of my_tags so that they could be in accordance with each other. And as far as I understand this should be made in some automatic way.
In which, I wonder?

It's always best to have integer class labels, everything seems to run a bit smoother. You can get these using `LabelEncoder`, i.e.

``````from sklearn import preprocessing
my_tags = ['negprivate', 'negstratified', 'positive']
le = preprocessing.LabelEncoder()
new_tags = le.fit_transform(my_tags)
``````

So now you will have `[0 1 2]` as your new tags. When you do your plotting, you want your labels to be intuitive, so you can use `inverse_transform` to get your labels, i.e.

``````le.inverse_transform(0)
``````

Outputs:

``````'negprivate'
``````
Source (Stackoverflow)