duhamp duhamp - 9 days ago 6
Python Question

Execute python code efficiently on rows in SQLite3 table

I have a large table (

table1
, columns are
name
,
key
,
info
) with ~1,000,000 rows on which I need to perform the following:


  1. Select all rows in which
    info
    is
    Null
    or
    ""

  2. Execute a conversion function in Python 3 that converts
    name
    to
    info
    (let's call it
    conversion(name)
    )

  3. Update the row with the new
    info
    value



What is the fastest way to perform this update? Are there any SQLite3 settings which could be activated to improve performance?

My current research has suggested the following with the SQLite3 library:

cursor = db.cursor()
cursor.execute('SELECT longkey, name FROM table1 WHERE info IS NULL or info = "";')
rows = cursor.fetchall()
items = []
for row in rows:
# Convert name to info
info = conversion(row[1])
items.append(info,row[0])
cursor.executemany('UPDATE table1 SET info = ? WHERE longkey = ?;',items)


The problem with this of course is the creation of the list
rows
which is enormous and very memory intensive.

I have considered multiple cursors but this seems to not be a good solution.

Edit: Is using
connection.create_function(name, num_params, func)
a possible solution to this?

How can I optimise this process to be fast but not extremely memory intensive?

CL. CL.
Answer

When you call the function directly from SQLite, the rows are updated one by one:

db.create_function('conversion', 1, conversion)
cursor.execute("UPDATE table1 SET info = conversion(name);")
Comments