cloud36 cloud36 - 1 month ago 8
Python Question

Python Pure RMSE vs Sklearn

I believe I'm making an error in my calculation of RMSE in pure python. Below is code.

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/3)

This returns: 0.707


However when I try with Sklearn

mean_squared_error(np.matrix(y_true),np.matrix(y_pred))**0.5
This returns: 0.612


Any idea what is going on? Pretty sure the my python code is correct.

Answer

You're not making an error. You're dividing by 3 and sklearn is dividing by 4

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/4)

0.61237243569579447

Dividing by n-1 gives you an unbiased estimation and is used when calculating 2nd moments for samples. When calculating these same moments for populations, we divide by n. Here is are links that could be relevant Wikipedia Some other link