lordlabakdas lordlabakdas - 1 month ago 32
Python Question

ValueError: Unknown label type: array while using Decision Tree Classifier and using a custom dataset

Given below is my code

dataset = np.genfromtxt('train_py.csv', dtype=float, delimiter=",")
X_train, X_test, y_train, y_test = train_test_split(dataset[:,:-1],dataset[:,-1], test_size=0.2,random_state=0)
model = tree.DecisionTreeClassifier(criterion='gini')
#y_train = y_train.tolist()
#X_train = X_train.tolist()
model.fit(X_train, y_train)
model.score(X_train, y_train)
predicted= model.predict(x_test)


I am trying to use the decision Tree classifier on a custom dataset imported using the numpy library. But I get a ValueError which is given below when I try to fit the model.I tried using both numpy arrays and non numpy arrays such as lists but still dont seem to figure out what is causing the error. Any help appreciated.

Traceback (most recent call last):
File "tree.py", line 19, in <module>
model.fit(X_train, y_train)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 177, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 173, in check_classification_targets
raise ValueError("Unknown label type: %r" % y)

ValueError: Unknown label type: array([[ 252.3352],....<until end of array>

Answer

python (scikit-learn) expects you to pass something that is label-like, thus: integer, string, etc. floats are not a typical encoding form of finite space, they are used for regression.

docu: fit X_train The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

Comments