ThatBird ThatBird - 1 year ago 76
Python Question

Saving huge amount of data (nearly 20 billion entries) in django postgresql

I'm trying to save about 15-20 billion entries in django model and I'm using postgresql. I tried to use django bulk_create but my computer got stuck for nearly 45 minutes and then I shut the code now. My question is, how to do this in the right way?

Answer Source

anonymous is right about dump files being the best way to load data from/to databases.

If you don't have access to the database in order to create a dump file, it might be harder, so a python way to make it work would be to bulk_create in batches.

For example:

inserts = []
last = len(entries)
batch_size = 10000

for i, entry in enumerate(entries):  ## or your datasource
    # transform data to django object
    inserts.append(EntryObject(attribute='attributes...'))

    if i % batch_size == 0 or i == last:

        EntryObject.bulk_create(inserts)  # insert batch

        inserts = []  # reset batch 

Then again, it depends on your datasource. Also you might want to look into running them as asynchronous tasks if it needs to be called as part of a Django view.