I have a dataset of over 5GBs. Is there a way I can train my model with this data chunk by chunk in a Stochastic Gradient Descent kind of way? In other words, break the set in 5 chunks of 1 GB each, and then train parameters.
I want to do this in a Python environment.
Yes, you can. SGD in scikit learn has
partial fit ; use it with your chunks
partial_fit(X, y[, classes, sample_weight]) Fit linear model with Stochastic Gradient Descent.