obiigbe91 - 7 months ago 53

Python Question

I am trying to get the ROC curve for a binary (good/bad) classifier that I used for a project. This classifier uses the genetic algorithm to make predictions.

E.g. a test chromosome given by [1.0,0.5,0.4,0.7] is said to be good if it matches another chromosome, say [0.8,0.5,0.3,0.6]. And by matching, I mean having an Euclidean distance value (from the other chromosome) below a particular value.

I have completed the classification of the 600 instances, and I have the final confusion matrix (by this matrix I mean the four-valued table from which we can we calculate the final TPR and FPR), the correct classification labels for each instance, and also all the predictions for each instance.

I have read this documentation about ROC curve, *Receiver operating characteristic* and *Tools for Machine Learning Performance Evaluation: ROC Curves in Python*. How do I proceed to get the ROC curve?

With my final four-valued table I think I can only plot a single point in the curve. The attached links above keeps mentioning that I need a score (i.e a probability score), but I don't know how I can get this for a genetic algorithm classifier. But how do I use the knowledge of each instance's prediction to create a kind of continuous ROC curve?

Disclaimer: I am new to the ROC plotting thing, and I am coding this in Python - hence, I attached the Python-related ROC documents.

Answer

It does not matter how did you create your classifier. In the end, your model is simply giving a positive label iff `||x - x_i|| < T`

, where `T`

is some predefined threshold. ROC curves are parametrized with exactly this kind of things - scalar value, which you can change to make things more biased toward classifing as positive or negative. So simply go through multiple values of T, compute metrics for each value and this will create your ROC curve. That's all!