Johannes Johannes - 8 months ago 76
Python Question

Update large sqlite database in chunks

I have a sqlite database (appr. 11 GB) that has multiple tables including the tables

. The table
is pretty large (120 mio rows),
is smaller (15 000 rows). I want to use sqlite3 in python to update one column of
by values of another column in
. The table vertices has an index on column
and another index on

What I am doing:

import sqlite3

conn = sqlite3.connect(db_path)
cur = conn.cursor()

cur.execute('''UPDATE distance SET
from_orig_v = (SELECT orig_cat FROM vertices WHERE cat=distance.source)''')

However running that update statement on such a large database, causes a memory error. The memory usage is increasing steadily until it crashes. I am looking for advise to perform such a large update statement without running out of memory? Maybe processing the update in chunks (i.e. rows of
table) and committing after e.g. 1000 updates to free memory? How would that be done in python/sqlite?

Answer Source

It should be possible to update chunks with statements like this:

UPDATE distance SET ... WHERE rowid BETWEEN 100000 AND 200000;

You don't need to use multiple transactions; the only thing that actually must be kept in memory is the list of rows to be updated in a single statement. (In theory, when memory runs out, you should get an appropriate error message. In practice, some OSes overcommit memory and don't tell the application about it until it's too late.)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download