Chuck Fulminata Chuck Fulminata - 7 months ago 38
Python Question

Correctly calculating the F1 score in Sklearn

I'm working in Python and I'm trying to get my the f1 score from my trained model. The documentation lists the syntax as:

f1_score(y_true, y_pred, average='macro')

but I cannot figure out what y_true and y_pred are supposed to be. Logically y_true should be the true value of y and y_pred is supposed to be the predicted value of y but by that definition I can only check one value at a time, am I missing something, or is there a way to check it against the entire dataset?


The F-score is a weight average of the precision and recall of your dataset. i.e. What portion of your predictions were true and what portion of trues did you predict:

I believe that Sklearn's function wants an array or matrix of labels for y_true and y_pred, where y_true is "actual label of i-th element" and y_pred is "predicted/classified label of the i-th element". The order of each must be matched! The ordering is what allows Sklean compute F-score for all predictions instead of just a single value.

e.g. If I use a classifier/model to make predictions on 5 people to get cancer:

y_pred = [True, False, True, False, False]

and I find out that only the 3rd person got cancer:

y_true = [False, False, True, False False]

Check the example in the Sklearn docs for more: