Mohit Verma Mohit Verma - 6 months ago 14
Python Question

Split tasks among threads in python

I have python code which reads data as stream (sys.stdin) and then perform some action for each line.
Now as volume of data is increasing, i want to split the task among threads and let them work in parallel.

Went through the docs and most of them suggest that threads need to poll (eg from Queue) to get task and work upon it. Here i need to push tasks to these threads.

Any idea/link where i can figure out how to do this ?

for line in sys.stdin:
//perform some action, which needs to be split among threads
//action is I/O-bound


One option is that i read from this stream, pipe it to Queue and let thread poll from there.

Answer

Use concurrent.futures (in the stdlib in 3.2, backport available for 2.5+):

from concurrent.futures import ThreadPoolExecutor
import sys

def some_action(line):
    pass # TODO: the actual task

with ThreadPoolExecutor() as executor:
    for line in sys.stdin:
        future = executor.submit(some_action, line)

Note that if the task is computationally intensive, you should use a MultiprocessingPoolExecutor instead of a ThreadPoolExecutor if your Python interpreter is limited by the GIL.