Shamsul Masum Shamsul Masum - 3 months ago 20
Python Question

how to solve ? x and y must have same first dimension

from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
from sklearn import metrics
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt

r = pd.read_csv("vitalsign_test.csv")
clm_list = []
for column in r.columns:
clm_list.append(column)
X = r[clm_list[1:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values

X_train, X_test, y_train, y_test = train_test_split (X,y, test_size = 0.3, random_state=4)


k_range = range(1,25)
scores = []
for k in k_range:
clf = KNeighborsClassifier(n_neighbors = k)
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
scores.append(metrics.accuracy_score(y_test,y_pred))

plt.plot(k_range,scores)
plt.xlabel('value of k for clf')
plt.ylabel('testing accuracy')


reponse that I am getting is


ValueError: x and y must have same first dimension


my feature and response shape is:

y.shape
Out[60]: (500,)

X.shape
Out[61]: (500, 6)

Answer

It has nothing to do with your X and y, it is about x and y arguments to plot, since your scores has one element, and k_range has 25. The error is incorrect indentation:

for k in k_range:
    clf = KNeighborsClassifier(n_neighbors = k)
    clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
scores.append(metrics.accuracy_score(y_test,y_pred))

should be

for k in k_range:
    clf = KNeighborsClassifier(n_neighbors = k)
    clf.fit(X_train,y_train)
    y_pred = clf.predict(X_test)
    scores.append(metrics.accuracy_score(y_test,y_pred))
Comments