Donbeo - 1 year ago 92

Python Question

I would like to check the prediction error of a new method trough cross-validation.

I would like to know if I can pass my method to the cross-validation function of sklearn and in case how.

I would like something like sklearn.cross_validation(cv=10).mymethod.

I need also to know how to define mymethod should it be a function and which input element and which output

for example we can consider as mymethod an implementation of the least square estimator (of course not the ones in sklearn)

I found this tutorial link but it is not very clear to me.

Can anyone help me?

EDIT

In the documentation they use

`>>> import numpy as np`

>>> from sklearn import cross_validation

>>> from sklearn import datasets

>>> from sklearn import svm

>>> iris = datasets.load_iris()

>>> iris.data.shape, iris.target.shape

((150, 4), (150,))

>>> clf = svm.SVC(kernel='linear', C=1)

>>> scores = cross_validation.cross_val_score(

... clf, iris.data, iris.target, cv=5)

...

>>> scores

But the problem is that they are using as estimator clf that is obtained by a function built in sklearn. How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score function?

EDIT 2

So for example suppose a simple estimator that use a linear model $y=x\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. How should I complete the code?

`class my_estimator():`

def fit(X,y):

beta=X[1,:]+alpha #where can I pass alpha to the function?

return beta

def scorer(estimator, X, y) #what should the scorer function compute?

return ?????

EDIT 3

I received an error

`class my_estimator():`

def fit(X, y, **kwargs):

#alpha = kwargs['alpha']

beta=X[1,:]#+alpha

return beta

>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score

for train, test in cv)

File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__

for function, args, kwargs in iterable:

File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>

for train, test in cv)

File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone

% (repr(estimator), type(estimator)))

TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.

>>>

EDIT 4

Your sample works but I am still having a problem. I am trying to implement the algorithm that is described in a paper (I will give you the link when I will be in the office). But It is very simple so maybe you do not need any reference. Where is the error?

`class Robustness_cv:`

def __init__(self, sim=3):

self.sim=sim

def predict(self,X):

return self.lm.predict(X).tolist() #predict the y using a linear model

def fit(self,X,y,**kwargs):

self.sim=kwargs['sim'] #self.sim is the number of iterations

kf=KFold(len(y),n_folds=10,shuffle=True)

cv=LassoCV(cv=kf).fit(X,y) #we fit the lasso on our data

alpha_seq=cv.alphas_ #from the lasso object we take the alphas path

alpha_min=[]

#we run cross_validation different time and every times we save the value of

# alpha_min

for i in range(self.sim):

kf=KFold(len(y),n_folds=10,shuffle=True)

cv=sklearn.linear_model.LassoCV(cv=kf,alphas=alpha_seq).fit(x,y)

alpha_min.append(cv.alpha_)

final_alpha=np.percentile(alpha_min,70) #we set the penalty final_alpha

clf=sklearn.linear_model.Lasso(alpha=final_alpha) # we fit the lasso with

# penalty final alpha

self.lm=clf.fit(X,y)

def get_params(self,deep=False):

return {'sim',self.sim}

cv_final=cross_val_score(Robustness_cv(),x,y,fit_params={'sim':3},scoring='mean_squared_error')

Traceback (most recent call last):

File "<ipython-input-85-eb591d82ec78>", line 1, in <module>

cv_final=cross_val_score(Robustness_cv(),x,y,fit_params={'sim':3},scoring='mean_squared_error')

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/cross_validation.py", line 1152, in cross_val_score

for train, test in cv)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 516, in __call__

for function, args, kwargs in iterable:

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/cross_validation.py", line 1152, in <genexpr>

for train, test in cv)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/base.py", line 46, in clone

for name, param in six.iteritems(new_object_params):

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/six.py", line 268, in iteritems

return iter(getattr(d, _iteritems)())

AttributeError: 'set' object has no attribute 'iteritems'

Answer Source

The answer also lies in sklearn's documentation.

You need to define two things:

an estimator that implements the

`fit(X, y)`

function,`X`

being the matrix with inputs and`y`

being the vector of outputsa scorer function, or callable object that can be used with:

`scorer(estimator, X, y)`

and returns the score of given model

Referring to your example: first of all, `scorer`

shouldn't be a method of the estimator, it's a different notion. Just create a callable:

```
def scorer(estimator, X, y)
return ????? # compute whatever you want, it's up to you to define
# what does it mean that the given estimator is "good" or "bad"
```

Or even a more simple solution: you can pass a string `'mean_squared_error'`

or `'accuracy'`

(full list available in this part of the documentation) to `cross_val_score`

function to use a predefined scorer.

As for the second thing, you can pass parameters to your model through the `fit_params`

`dict`

parameter of the `cross_val_score`

function (as mentioned in the documentation). These parameters will be passed to the `fit`

function.

```
class my_estimator():
def fit(X, y, **kwargs):
alpha = kwargs['alpha']
beta=X[1,:]+alpha
return beta
```

After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example:

```
import numpy as np
from sklearn.cross_validation import cross_val_score
class RegularizedRegressor:
def __init__(self, l = 0.01):
self.l = l
def combine(self, inputs):
return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])
def predict(self, X):
return [self.combine(x) for x in X]
def classify(self, inputs):
return sign(self.predict(inputs))
def fit(self, X, y, **kwargs):
self.l = kwargs['l']
X = np.matrix(X)
y = np.matrix(y)
W = (X.transpose() * X).getI() * X.transpose() * y
self.weights = [w[0] for w in W.tolist()]
def get_params(self, deep = False):
return {'l':self.l}
X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])
y = np.matrix([0, 1, 1, 0]).transpose()
print cross_val_score(RegularizedRegressor(),
X,
y,
fit_params={'l':0.1},
scoring = 'mean_squared_error')
```