Arslán Arslán - 1 year ago 158
Python Question

Mining massive data sets in Python

I have a dataset of over 5GBs. Is there a way I can train my model with this data chunk by chunk in a Stochastic Gradient Descent kind of way? In other words, break the set in 5 chunks of 1 GB each, and then train parameters.

I want to do this in a Python environment.

Answer Source

Yes, you can. SGD in scikit learn has partial fit ; use it with your chunks

partial_fit(X, y[, classes, sample_weight]) Fit linear model with Stochastic Gradient Descent.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download