Dole Dole - 2 months ago 6
Python Question

Why does numpy least squares result diverge from using the direct formula?

I want to calculate the least squares estimate for given data.

There are a few ways to do this, one is to use numpy's least squares:

import numpy

Where X is a matrix and y a vector of compatible dimension (type float64). Second way is to calculate the result directly using the formula:

import numpy

My problem: there are cases where the different formulas give radically different results (although there may be no difference). Sometimes the coefficients grow to be extremely large, using one formula, while the other is much more well behaved. The formulas are the same so why can the results diverge so much? Is this some type of rounding error and how do I minimize it?


While those two formulas are mathematically equivalent, they are not numerically equivalent! There are better ways to solve a system of linear equations Ax = b than by multiplying both sides by A^(-1), like Gaussian Elimination. numpy.linalg.lstsq uses this (and more sophisticated) methods to solve the underlying linear system, plus it can handle a lot of corner cases. So use it when you can.

Matrix inversion is very numerically unstable. Don't do it unless you have to.