Amir Amir - 2 months ago 12
Python Question

Can't get SVC Score function to work

I am trying to run this machine learning platform and I get the following error:

ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time


My Code:

from pylab import *
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import numpy as np

X = list ()
Y = list ()
validationX = list ()
validationY = list ()
file = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineTraining.txt','r')
for eachline in file:
strArray = eachline.split(";")
row = list ()
for i in range(len(strArray) - 1):
row.append(float(strArray[i]))
X.append(row)
if (int(strArray[-1]) > 6):
Y.append(1)
else:
Y.append(0)
file2 = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineValidation.txt', 'r')
for eachline in file2:
strArray = eachline.split(";")
row2 = list ()
for i in range(len(strArray) - 1):
row2.append(float(strArray[i]))
validationX.append(row2)
if (int(strArray[-1]) > 6):
validationY.append(1)
else:
validationY.append(0)

X = np.array(X)
print (X)
Y = np.array(Y)
print (Y)
validationX = np.array(validationX)
validationY = np.array(validationY)

clf = svm.SVC()
clf.fit(X,Y)
result = clf.predict(validationX)
clf.score(result, validationY)


The goal of the program is to to build a model from the fit() command where we can use it to compare to a validation set in validationY and see the validity of our machine learning model. Here is the rest of the console output: keep in mind X is confusingly a 11x574 array!

[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ]
[ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ]
[ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ]
...,
[ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ]
[ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ]
[ 8.4 0.27 0.3 ..., 2.89 0.3
11.46666667]]
[0 0 0 ..., 0 1 0]
C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):

File "<ipython-input-68-31c649fe24b3>", line 1, in <module>
runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')

File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)

File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 43, in <module>
clf.score(result, validationY)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict
y = super(BaseSVC, self).predict(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict
X = self._validate_for_predict(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict
(n_features, self.shape_fit_[1]))

ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time


runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')
10
[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ]
[ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ]
[ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ]
...,
[ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ]
[ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ]
[ 8.4 0.27 0.3 ..., 2.89 0.3
11.46666667]]
[0 0 0 ..., 0 1 0]
C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):

File "<ipython-input-69-31c649fe24b3>", line 1, in <module>
runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')

File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)

File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 46, in <module>
clf.score(result, validationY)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict
y = super(BaseSVC, self).predict(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict
X = self._validate_for_predict(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict
(n_features, self.shape_fit_[1]))``

Answer

You are simply passing wrong object to score function, documentation clearly states

score(X, y, sample_weight=None)

X : array-like, shape = (n_samples, n_features) Test samples.

and you pass predictions instead, thus

result = clf.predict(validationX)
clf.score(result, validationY)

is invalid, and should be just

clf.score(validationX, validationY)

What you tried to do would be fine if you use some scorer, and not classifier, classifier .score methods call .predict on their own, thus you pass raw data as an argument.

Comments