user124114 user124114 - 1 year ago 139
Python Question

How to transform items using sklearn Pipeline?

I have a simple scikit-learn

of two steps: a
followed by a

I have fit the pipeline using my data. All good.

Now I want to transform (not predict!) an item, using my fitted

I tried
, but it gives a different result compared to
. Even the shape and type of the result is different: the first is a 1x3000 CSR matrix, the second a 1x15000 CSC matrix. Which one is correct? Why do they differ?

How do I transform items, i.e. get an item's vector representation before the final estimator, when using scikit-learn's

Answer Source

You can't call a transform method on a pipeline which contains Non-transformer on last step. If you wan't to call transfrom on such pipeline last estimator must be a transformer.

Even method doc says so:

Applies transforms to the data, and the transform method of the final estimator. Valid only if the final estimator implements transform.

Also, there is no method to use every estimator except last one. Thou you can make your own Pipeline, and inherit everything from scikit-learn's Pipeline, but add one method, something like:

def just_transforms(self, X):
    """Applies all transforms to the data, without applying last 

    X : iterable
        Data to predict on. Must fulfill input requirements of first step of
        the pipeline.
    Xt = X
    for name, transform in self.steps[:-1]:
        Xt = transform.transform(Xt)
    return Xt