Firstly, I fit it on the corpus of sms:
from sklearn.feature_extraction.text import CountVectorizer
clf = CountVectorizer()
X_desc = clf.fit_transform(X).toarray()
X.shape = (5574,)
X_desc.shape = (5574, 8713)
str2 = 'Have you visited the last lecture on physics?'
print len(str2), clf.transform(str2).toarray().shape
52 (52, 8713)
You always need to pass an array or vector to
transform; if you just want to transform a single element, you need to pass a singleton array, and then extract its contents:
Incidentally the reason that you are getting a 2-dimensional array as output is that the a string is actually stored as a list of characters, and so the vectoriser is treating your string as an array, where each character is being considered as a single document.