# How to use the confusion matrix module in NLTK?

I followed the NLTK book in using the confusion matrix but the confusionmatrix looks very odd.

``````#empirically exam where tagger is making mistakes
test_tags = [tag for sent in brown.sents(categories='editorial')
for (word, tag) in t2.tag(sent)]
gold_tags = [tag for (word, tag) in brown.tagged_words(categories='editorial')]
print nltk.ConfusionMatrix(gold_tags, test_tags)
``````

Can anyone explain how to use the confusion matrix? alvas
Firstly, I assume that you got the code from old `NLTK`'s chapter 05: https://nltk.googlecode.com/svn/trunk/doc/book/ch05.py, particularly you're look at this section: http://pastebin.com/EC8fFqLU

Now, let's look at the confusion matrix in `NLTK`, try:

``````from nltk.metrics import ConfusionMatrix
ref  = 'DET NN VB DET JJ NN NN IN DET NN'.split()
tagged = 'DET VB VB DET NN NN NN IN DET NN'.split()
cm = ConfusionMatrix(ref, tagged)
print cm
``````

[out]:

``````    | D         |
| E I J N V |
| T N J N B |
----+-----------+
DET |<3>. . . . |
IN | .<1>. . . |
JJ | . .<.>1 . |
NN | . . .<3>1 |
VB | . . . .<1>|
----+-----------+
(row = reference; col = test)
``````

The numbers embedded in `<>` are the true positives (tp). And from the example above, you see that one of the `JJ` from reference was wrongly tagged as `NN` from the tagged output. For that instance, it counts as one false positive for `NN` and one false negative for `JJ`.

To access the confusion matrix (for calculating precision/recall/fscore), you can access the false negatives, false positives and true positives by:

``````labels = set('DET NN VB IN JJ'.split())

true_positives = Counter()
false_negatives = Counter()
false_positives = Counter()

for i in labels:
for j in labels:
if i == j:
true_positives[i] += cm[i,j]
else:
false_negatives[i] += cm[i,j]
false_positives[j] += cm[i,j]

print "TP:", sum(true_positives.values()), true_positives
print "FN:", sum(false_negatives.values()), false_negatives
print "FP:", sum(false_positives.values()), false_positives
``````

[out]:

``````TP: 8 Counter({'DET': 3, 'NN': 3, 'VB': 1, 'IN': 1, 'JJ': 0})
FN: 2 Counter({'NN': 1, 'JJ': 1, 'VB': 0, 'DET': 0, 'IN': 0})
FP: 2 Counter({'VB': 1, 'NN': 1, 'DET': 0, 'JJ': 0, 'IN': 0})
``````

To calculate Fscore per label:

``````for i in sorted(labels):
if true_positives[i] == 0:
fscore = 0
else:
precision = true_positives[i] / float(true_positives[i]+false_positives[i])
recall = true_positives[i] / float(true_positives[i]+false_negatives[i])
fscore = 2 * (precision * recall) / float(precision + recall)
print i, fscore
``````

[out]:

``````DET 1.0
IN 1.0
JJ 0
NN 0.75
VB 0.666666666667
``````

I hope the above will de-confuse the confusion matrix usage in `NLTK`, here's the full code for the example above:

``````from collections import Counter
from nltk.metrics import ConfusionMatrix

ref  = 'DET NN VB DET JJ NN NN IN DET NN'.split()
tagged = 'DET VB VB DET NN NN NN IN DET NN'.split()
cm = ConfusionMatrix(ref, tagged)

print cm

labels = set('DET NN VB IN JJ'.split())

true_positives = Counter()
false_negatives = Counter()
false_positives = Counter()

for i in labels:
for j in labels:
if i == j:
true_positives[i] += cm[i,j]
else:
false_negatives[i] += cm[i,j]
false_positives[j] += cm[i,j]

print "TP:", sum(true_positives.values()), true_positives
print "FN:", sum(false_negatives.values()), false_negatives
print "FP:", sum(false_positives.values()), false_positives
print

for i in sorted(labels):
if true_positives[i] == 0:
fscore = 0
else:
precision = true_positives[i] / float(true_positives[i]+false_positives[i])
recall = true_positives[i] / float(true_positives[i]+false_negatives[i])
fscore = 2 * (precision * recall) / float(precision + recall)
print i, fscore
``````
