ThatBird ThatBird - 1 month ago 16
Python Question

Saving huge amount of data (nearly 20 billion entries) in django postgresql

I'm trying to save about 15-20 billion entries in django model and I'm using postgresql. I tried to use django bulk_create but my computer got stuck for nearly 45 minutes and then I shut the code now. My question is, how to do this in the right way?

Answer

anonymous is right about dump files being the best way to load data from/to databases.

If you don't have access to the database in order to create a dump file, it might be harder, so a python way to make it work would be to bulk_create in batches.

For example:

inserts = []
last = len(entries)
batch_size = 10000

for i, entry in enumerate(entries):  ## or your datasource
    # transform data to django object
    inserts.append(EntryObject(attribute='attributes...'))

    if i % batch_size == 0 or i == last:

        EntryObject.bulk_create(inserts)  # insert batch

        inserts = []  # reset batch 

Then again, it depends on your datasource. Also you might want to look into running them as asynchronous tasks if it needs to be called as part of a Django view.

Comments