Kousha Kousha - 4 years ago 95
MySQL Question

Python threading to process database rows

Given the simple code below:

cur.execute("SELECT * FROM myTable")

for row in cur.fetchall():
process(row)


If the database is large, this will take a long time to process all the rows. How do I use multi-threading here to help me? Specifically, each row should only be processed once, so when multi-threading, each thread should only process on ID.

Do I need to basically get the total length of database, decide how many threads I want, and then basically tell each thread the min/max id it should filter through?

Answer Source

Because the python GIL limits parallelism in threads, you are likely better off using subprocesses instead of threads. You can use a pool that limits the number of concurrent processes so that you don't over-commit your server. The multiprocessing module should do the trick. The cursor is already an iterator and with a small chunksize when calling map, it will only pull data as needed. I don't know the size of your rows or the processing time envolved so I picked a modest chunksize and the default pool size which is the number of cores on your machine.

import multiprocessing as mp

pool = mp.Pool()

cur.execute("SELECT * FROM myTable")

for result in pool.map(process, cur, chunksize=5):
    print(result)

You can experiment with a thread pool by substituting

import multiprocessing.Pool
pools = multiprocessing.Pool.ThreadPool()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download