cloud36 - 5 months ago 54

Python Question

I believe I'm making an error in my calculation of RMSE in pure python. Below is code.

`y_true = [3, -0.5, 2, 7]`

y_pred = [2.5, 0.0, 2, 8]

e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1

ee = np.dot(e,e)

np.sqrt(ee.sum()/3)

This returns: 0.707

However when I try with Sklearn

`mean_squared_error(np.matrix(y_true),np.matrix(y_pred))**0.5`

This returns: 0.612

Any idea what is going on? Pretty sure the my python code is correct.

Answer

You're not making an error. You're dividing by `3`

and `sklearn`

is dividing by `4`

```
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
e = abs(np.matrix(y_pred) - np.matrix(y_true)).A1
ee = np.dot(e,e)
np.sqrt(ee.sum()/4)
0.61237243569579447
```

Dividing by `n-1`

gives you an unbiased estimation and is used when calculating 2nd moments for samples. When calculating these same moments for populations, we divide by `n`

. Here is are links that could be relevant Wikipedia Some other link