Arslán - 1 year ago 122

Python Question

I have a dataset of over 5GBs. Is there a way I can train my model with this data chunk by chunk in a Stochastic Gradient Descent kind of way? In other words, break the set in 5 chunks of 1 GB each, and then train parameters.

I want to do this in a Python environment.

Answer Source

Yes, you can. SGD in scikit learn has `partial fit`

; use it with your chunks

```
partial_fit(X, y[, classes, sample_weight]) Fit linear model with Stochastic Gradient Descent.
```