gary yong - 4 months ago 35

Python Question

`train.sort_values(by=['mass'], ascending=True, inplace=True)`

x = train['mass']

y = train['pa']

# Fit regression model

svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)

svr_lin = SVR(kernel='linear', C=1e3)

svr_poly = SVR(kernel='poly', C=1e3, degree=2)

x_train = x.reshape(x.shape[0], 1)

x = x_train

y_rbf = svr_rbf.fit(x, y).predict(x)

y_lin = svr_lin.fit(x, y).predict(x)

y_poly = svr_poly.fit(x, y).predict(x)

# look at the results

plt.scatter(x, y, c='k', label='data')

plt.hold('on')

plt.plot(x, y_rbf, c='g', label='RBF model')

plt.plot(x, y_lin, c='r', label='Linear model')

plt.plot(x, y_poly, c='b', label='Polynomial model')

plt.xlabel('data')

plt.ylabel('target')

plt.title('Support Vector Regression')

plt.legend()

plt.show()

The code is copied from http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html.

And what I change is only the dataset. I do not know what is the matter.

Answer

Most likely has to do with the scale of your data. You are using the same penalty hyper-parameter as they are in the example, but your y values are orders of magnitude greater. Thus, the SVR algorithm will favor simplicity over accuracy since your penalty for error is now small compared to your y values. You need to increase C to say `1e6`

(or normalize your y values).

You can see that this is the case if you make C very small in their example code, say `C=.00001`

. Then you get the same kind of results that you are getting in your code.

(More on the algorithm here.)

As a side note, a huge part of Machine Learning practice is hyper-parameter tuning. This is a good example of how even a good base model can yield bad results if provided with the wrong hyper-parameters.