Johannes Johannes - 2 months ago 19
Python Question

Update large sqlite database in chunks

I have a sqlite database (appr. 11 GB) that has multiple tables including the tables

distance
and
vertices
. The table
distance
is pretty large (120 mio rows),
vertices
is smaller (15 000 rows). I want to use sqlite3 in python to update one column of
distance
by values of another column in
vertices
. The table vertices has an index on column
cat
and another index on
orig_cat
.

What I am doing:

import sqlite3
db_path='path/to/db.db'

conn = sqlite3.connect(db_path)
cur = conn.cursor()

cur.execute('''UPDATE distance SET
from_orig_v = (SELECT orig_cat FROM vertices WHERE cat=distance.source)''')


However running that update statement on such a large database, causes a memory error. The memory usage is increasing steadily until it crashes. I am looking for advise to perform such a large update statement without running out of memory? Maybe processing the update in chunks (i.e. rows of
distance
table) and committing after e.g. 1000 updates to free memory? How would that be done in python/sqlite?

CL. CL.
Answer Source

It should be possible to update chunks with statements like this:

UPDATE distance SET ... WHERE rowid BETWEEN 100000 AND 200000;

You don't need to use multiple transactions; the only thing that actually must be kept in memory is the list of rows to be updated in a single statement. (In theory, when memory runs out, you should get an appropriate error message. In practice, some OSes overcommit memory and don't tell the application about it until it's too late.)