salols salols - 5 months ago 21x
Python Question

ValueError: Found arrays with inconsistent numbers of samples: [ 4 16149]

Hi I'm new to scikit learn and data science in general. I am running into the above issue while trying to retrieve the most informative features from my vectorizer. My code (edited to reflect @Gang's comment):

values =
word_vectorizer = CountVectorizer(analyzer='word', stop_words=custom_stop_words)
trainset = word_vectorizer.fit_transform(values)
tags = ['dem','rep','dem','rep']
tags = np.array(tags)
trainset = trainset.toarray()

word_svm = svm.LinearSVC(), tags)

def most_informative_feature_for_binary_classification(vectorizer, classifier, n=10):
class_labels = classifier.classes_
feature_names = vectorizer.get_feature_names()
topn_class1 = sorted(zip(classifier.coef_[0], feature_names))[:n]
topn_class2 = sorted(zip(classifier.coef_[0], feature_names))[-n:]

for coef, feat in topn_class1:
print class_labels[0], coef, feat


for coef, feat in reversed(topn_class2):
print class_labels[1], coef, feat

most_informative_feature_for_binary_classification(word_vectorizer, word_svm)

Terminal output:

Traceback (most recent call last):
File "", line 251, in <module>, tags)
File "/usr/local/lib/python2.7/site-packages/sklearn/svm/", line 205, in fit
dtype=np.float64, order="C")
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/", line 520, in check_X_y
check_consistent_length(X, y)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [ 4 16149]

I'd appreciate any and all help on this matter. If I haven't presented enough information, please let me know. Thank you in advance for your time!


Here is where it failed - both parameters should be the same type -- array, tags)

tags is not an array, should be converted to array

tags = ['dem','rep','dem','rep']

You can use print to see if they are the same type

print type(tags)
print type(trainset)