sapo_cosmico sapo_cosmico - 12 days ago 5
Python Question

DataConversionWarning fitting RandomForestRegressor in Scikit

I'm trying to fit a RandomForestRegressor to my training set,

rfr.fit(train_X , train_y)


but keep getting the following warning:


/usr/local/lib/python2.7/dist-packages/IPython/kernel/main.py:1: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
if name == 'main':


I am using Pandas, and therefore assumed that the training set might need to be in numpy arrays, so called .values:

train_y = train[label].values
train_X = train[features].values


Checking to see the type, and shape:

print type(train_X), train_X.shape
print type(train_y), train_y.shape


Returns:

<type 'numpy.ndarray'> (20457, 44)
<type 'numpy.ndarray'> (20457, 1)


Not really sure what to do next, only found this answer but it wasn't much help.

It does actually output a result, but I have no idea if it is the right one. With cross validation, it keeps creating that warning over and over again.

Answer

The warning tells you exactly what to do, right? What is the question? If the results are correct despite the warning? Yes they are, because what you mean is using a 1d vector y.

How to get rid of the warning? If you meant y to be a 1d vector and not a column of a matrix, use y.ravel() as the warning says.