Hsun-Yi Hsieh Hsun-Yi Hsieh - 4 months ago 25
Python Question

How to handle the score method in sklearn?

This is an extension from my previous question How to convert a groupby().mean() into a callable object?

I am grateful for the help that I received from this forum and Alberto Garcia-Raboso in particular who answered my question about this model.

As I proceed, more errors occur. This one seems hard for me to correct. It is about the performance evaluation of the model. I attempted to use .score(pred_values, real_values) but the error suggests the input values are not in the [index]:

KeyError: 'None of [[87.333333333333329, 76.0, 81.5, 87.333333333333329, 87.333333333333329, 76.0, 81.5]] are in the [index]'


I am not sure how to explain this. Where is the index and how to access to it and fix the problem?

I have been pondering about this actually for a long while. As I try again, I still cannot solve the problem. I would be grateful to any assistance. Thank you.

Model

from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd
import numpy as np

class MeanClassifier(BaseEstimator, ClassifierMixin):
def __init__(self):
pass

def fit(self, X, y):
self.name = X
self.scores = y
self.data = pd.DataFrame({"name": self.name, "score": self.scores})
#print(self.data)
self.means = self.data.groupby(["name"]).mean()
#print(self.means)
return self

def predict(self, X):
return list(self.means.loc[X, 'score'])


Data inputs and model testing

names = ["John", "Mary", "Suzie", "John", "John", "Mary", "Suzie"]
scores = [80, 70, 75, 90, 92, 82, 88]
dd = pd.DataFrame({"name": names, "score": scores})

ddnames = list(dd['name'])
ddscores = list(dd['score'])

B = MeanClassifier()
Bfit = B.fit(ddnames, ddscores)

Bpred = B.predict(dd['name'])
#print(Bpred)

print(B.score(Bpred, ddscores)) #The error appears here

Answer

There are two problems in your code...the first one is with the score method.

The function definition of score is like -

score(X, y[, sample_weight])

And just to mention score calls predict itself in the backend.

where X is your feature set and y is your true data. What you supplied is predicted list and the true list. So change that line to simply -

print(B.score(ddnames, ddscores))

But if you run this you'll get another error -

Can't handle mix of multiclass and continuous

And why you get this error is you are inheriting ClassifierMixin and doing a regression task. So in simpler words you are giving continuous output but classifiermixin is thinking of it as a classification problem.

So just inherit RegressorMixin and you are good to go.

#left code#
from sklearn.base import BaseEstimator, RegressorMixin
class MeanClassifier(BaseEstimator, RegressorMixin):
def __init__(self):
    pass
#left code#

print(B.score(ddnames, ddscores))  

Output -

0.395607701564
Comments