Katya Handler - 1 year ago 248

Python Question

I am trying to use SKLearn to run an SVM model. I am just trying it out now with some sample data. Here is the data and the code:

`import numpy as np`

from sklearn import svm

import random as random

A = np.array([[random.randint(0, 20) for i in range(2)] for i in range(10)])

lab = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

clf = svm.SVC(kernel='linear', C=1.0)

clf.fit(A, lab)

FYI, when I run

`import sklearn`

sklearn.__version__

It outputs 0.17.

Now, when I run

`print(clf.predict([1, 1]))`

`C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\ut`

ils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecat

ed in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.re

shape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contain

s a single sample.

DeprecationWarning)

It does give me a prediction, which is great. However, I find this weird for a few reasons.

I don't have a 1d array. If you print A, you get

`array([[ 9, 12],`

[ 2, 16],

[14, 14],

[ 4, 2],

[ 8, 4],

[12, 3],

[ 0, 0],

[ 3, 13],

[15, 17],

[15, 16]])

Which appears to me to be 2 dimensional. But okay, let's just say that what I have is in fact a 1D array. Let's try to change it using

`reshape`

Same code as above, but now we have

`A = np.array([[random.randint(0, 20) for i in range(2)] for i in range(10)]).reshape(-1,1)`

But then this outputs an array of length 20, which makes no sense and is not what I want. I also tried it with

`reshape(1, -1)`

How can I reshape my data in numpy arrays so that I don't get this warning?

I looked at two answers on SO, and neither worked for me. Question 1 and Question 2. It seems that Q1 was actually 1D data and was solved using

`reshape`

Answer Source

The error is coming from the predict method. Numpy will interpret [1,1] as a 1d array. So this should avoid the warning:

`clf.predict(np.array([[1,1]]))`

Notice that:

```
In [14]: p1 = np.array([1,1])
In [15]: p1.shape
Out[15]: (2,)
In [16]: p2 = np.array([[1,1]])
In [17]: p2.shape
Out[17]: (1, 2)
```

Also, note that you can't use an array of shape (2,1)

```
In [21]: p3 = np.array([[1],[1]])
In [22]: p3.shape
Out[22]: (2, 1)
In [23]: clf.predict(p3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e4070c037d78> in <module>()
----> 1 clf.predict(p3)
/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in predict(self, X)
566 Class labels for samples in X.
567 """
--> 568 y = super(BaseSVC, self).predict(X)
569 return self.classes_.take(np.asarray(y, dtype=np.intp))
570
/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in predict(self, X)
303 y_pred : array, shape (n_samples,)
304 """
--> 305 X = self._validate_for_predict(X)
306 predict = self._sparse_predict if self._sparse else self._dense_predict
307 return predict(X)
/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in _validate_for_predict(self, X)
472 raise ValueError("X.shape[1] = %d should be equal to %d, "
473 "the number of features at training time" %
--> 474 (n_features, self.shape_fit_[1]))
475 return X
476
ValueError: X.shape[1] = 1 should be equal to 2, the number of features at training time
```