sklearn linear regression for large data


support online/incremental learning?

I have 100 groups of data, and I am trying to implement them altogether. For each group, there are over 10000 instances and ~ 10 features, so it will lead to memory error with sklearn if I construct a huge matrix (10^6 by 10). It will be nice if I can update the regressor each time with batch samples of new group.

I found this post relevant, but the accepted solution works for online learning with single new data (only one instance) rather than batch samples.


Take a look at linear_model.SGDRegressor, it learns a a linear model using stochastic gradient.

In general, sklearn has many models that admit "partial_fit", they are all pretty useful on medium to large datasets that don't fit in the RAM.