Edward - 1 year ago 797

Python Question

I try to build a pipeline with variable transformation

And i do as below

`import numpy as np`

import pandas as pd

import sklearn

from sklearn import linear_model

from sklearn.base import BaseEstimator, TransformerMixin

from sklearn.pipeline import Pipeline

Dataframe

`df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})`

I try to get a new variable for predict

`class Complex():`

def __init__(self, X1, X2):

self.a = X1

self.b = X2

def transform(self, X1, X2):

age = pd.DataFrame(self.a - self.b)

return age

def fit_transform(self, X1, X2):

self.fit( X1, X2)

return self.transform(X1, X2)

def fit(self, X1, X2):

return self

Then i make a pipeline

`X = df[['a', 'b']]`

y = df['y']

regressor = linear_model.SGDRegressor()

pipeline = Pipeline([

('transform', Complex(X['a'], X['b'])) ,

('model_fitting', regressor)

])

pipeline.fit(X, y)

and i get error

`pred = pipeline.predict(X)`

pred

TypeError Traceback (most recent call last)

<ipython-input-555-7a07ccb0c38a> in <module>()

----> 1 pred = pipeline.predict(X)

2 pred

C:\Program Files\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)

52

53 # lambda, but not partial, allows help() to work with update_wrapper

---> 54 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)

55 # update the docstring of the returned function

56 update_wrapper(out, self.fn)

C:\Program Files\Anaconda3\lib\site-packages\sklearn\pipeline.py in predict(self, X)

324 for name, transform in self.steps[:-1]:

325 if transform is not None:

--> 326 Xt = transform.transform(Xt)

327 return self.steps[-1][-1].predict(Xt)

328

TypeError: transform() missing 1 required positional argument: 'X2'

what i do wrong? I see the mistake is in class Complex(). How to fix it?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

So the problem is that `transform`

expects an argument of *array of shape [n_samples, n_features]*

See the **Examples** section in the documentation of `sklearn.pipeline.Pipeline`

, it uses `sklearn.feature_selection.SelectKBest`

as a transform, and you can see its source that it expects `X`

to be an array instead of separate variables like `X1`

and `X2`

.

In short, your code can be fixed like this:

```
import pandas as pd
import sklearn
from sklearn import linear_model
from sklearn.pipeline import Pipeline
df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})
class Complex():
def transform(self, Xt):
return pd.DataFrame(Xt['a'] - Xt['b'])
def fit_transform(self, X1, X2):
return self.transform(X1)
X = df[['a', 'b']]
y = df['y']
regressor = linear_model.SGDRegressor()
pipeline = Pipeline([
('transform', Complex()) ,
('model_fitting', regressor)
])
pipeline.fit(X, y)
pred = pipeline.predict(X)
print(pred)
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**