zell zell - 7 months ago 23
Python Question

My python multiprocessing program runs the worker programs sequentially

I have written a python program using multiprocessing. The program invokes a 8 worker which outputs a random number after sleeping 3 second. I expect the program finishes in 3 seconds but it finishes in 24 seconds, as if each worker function is evaluated sequentially rather than in parallel. Any idea?

import time
import numpy as np
import multiprocessing as mp
import time
import sys

def f(i):
np.random.seed(int(time.time()+i))

time.sleep(3)
res=np.random.rand()
print "From i = ",i, " res = ",res


if __name__=='__main__':
num_workers=mp.cpu_count() # My CPu has 8 cores.
pool=mp.Pool(num_workers)
for i in range(num_workers):
p=pool.apply_async(f, args=(i,))
p.get()

pool.close()
pool.join()


However, if I use Process instead of Pool, I get the right results as expected:

import time
import numpy as np
import multiprocessing as mp
import time
import sys

def f(i):
np.random.seed(int(time.time()+i))

time.sleep(3)
res=np.random.rand()
print "From i = ",i, " res = ",res
if res>0.7:
print "find it"


if __name__=='__main__':
num_workers=mp.cpu_count()
pool=mp.Pool(num_workers)
for i in range(num_workers):
p=mp.Process(target=f,args=(i,))
p.start()

Answer

Think about what you're doing here:

for i in range(num_workers):
    p=pool.apply_async(f, args=(i,))
    p.get()

Each time through the loop, you send some work off to a pool process, and then (via .get()) you explicitly wait for that process to return its result. So of course nothing much happens in parallel.

The usual way to do this is more like:

workers = [pool.apply_async(f, args=(i,)) for i in range(num_workers)]
for w in workers:
    w.get()

That is, you start as many workers going as you want before you wait for any of them.