Hyperion Hyperion - 16 days ago 6
Python Question

Python - multiprocessing pool.map not processing list in order

I have this script to process some urls in parallel:

import multiprocessing
import time

list_of_urls = []

for i in range(1,1000):
list_of_urls.append('http://example.com/page=' + str(i))

def process_url(url):
page_processed = url.split('=')[1]
print 'Processing page %s'% page_processed
time.sleep(5)

pool = multiprocessing.Pool(processes=4)
pool.map(process_url, list_of_urls)


The list is ordered, but when I run it, the script doesn't pick urls from list in order:

Processing page 1
Processing page 64
Processing page 127
Processing page 190
Processing page 65
Processing page 2
Processing page 128
Processing page 191


Instead, I would like it to process page 1,2,3,4 at first, then continue following the order in the list. Is there an option to do this?

Answer

If you do not pass argument chunksize map will calculate chunks using this algorithm:

chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
   chunksize += 1

It's cuting your iterable into task_batches and running it on sperate process. That is why it's not in order. The solution is to declare chunk equil to 1.

import multiprocessing
import time

list_test = range(10)

def proces(task):
    print "task:", task
    time.sleep(1)

pool = multiprocessing.Pool(processes=3)
pool.map(proces, list_test, chunksize=1)

task: 0
task: 1
task: 2
task: 3
task: 4
task: 5
task: 6
task: 7
task: 8
task: 9
Comments