Dole - 6 months ago 49

Python Question

I want to calculate the least squares estimate for given data.

There are a few ways to do this, one is to use numpy's least squares:

`import numpy`

np.linalg.lstsq(X,y)[0]

Where X is a matrix and y a vector of compatible dimension (type float64). Second way is to calculate the result directly using the formula:

`import numpy`

numpy.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

My problem: there are cases where the different formulas give radically different results (although there may be no difference). Sometimes the coefficients grow to be extremely large, using one formula, while the other is much more well behaved. The formulas are the same so why can the results diverge so much? Is this some type of rounding error and how do I minimize it?

Answer

While those two formulas are mathematically equivalent, they **are not** numerically equivalent! There are better ways to solve a system of linear equations Ax = b than by multiplying both sides by A^(-1), like Gaussian Elimination. `numpy.linalg.lstsq`

uses this (and more sophisticated) methods to solve the underlying linear system, plus it can handle a lot of corner cases. So use it when you can.

Matrix inversion is very numerically unstable. Don't do it unless you have to.