Chris Arthur Chris Arthur - 1 year ago 93
Python Question

Preprocessing in scikit learn - single sample - Depreciation warning

On a fresh installation of Anaconda under Ubuntu... I am preprocessing my data in various ways prior to a classification task using Scikit-Learn.

from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)

This all works fine but if I have a new sample (temp below) that I want to classify (and thus I want to preprocess in the same way then I get

temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)

Then I get a depreciation warning...

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and willraise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.

So the question is how should I be rescaling such a single sample?

I suppose an alternative (not very good one) would be...

temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]

But I'm sure there are better ways.

Any ideas gratefully received


Answer Source

Well, it actually looks like the warning is telling you what to do.

As part of sklearn.pipeline stages' uniform interfaces, as a rule of thumb:

  • when you see X, it should be an np.array with two dimensions

  • when you see y, it should be an np.array with a single dimension.

Here, therefore, you should consider the following:

temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)