Bangyou - 6 months ago 38

R Question

I would like to get the slope of linear regression for 1M separate data sets (1M * 50 rows for data.frame, or 1M * 50 for array). Now I am using lm function, which take very long time (about 10 min).

Is there any faster function for linear regression?

Thanks for any suggestions.

Answer

Yes there are:

R itself has

`lm.fit()`

which is more bare-bones: no formula notation, much simpler result setseveral of our Rcpp-related packages have

`fastLm()`

implementations: RcppArmadillo, RcppEigen, RcppGSL.

We have described `fastLm()`

in a number of blog posts and presentations. If you want it in the fastest way, do not use the formula interface: parsing the formula and preparing the model matrix takes more time than the actual regression.

That said, if you are regressing a single vector on a single vector you can simplify this as no matrix package is needed.