user2133814 user2133814 - 3 months ago 12
Python Question

python multithreading pool with expensive initialization

Here is a complete simple working example

import multiprocessing as mp
import time
import random


class Foo:
def __init__(self):
# some expensive set up function in the real code
self.x = 2
print('initializing')

def run(self, y):
time.sleep(random.random() / 10.)
return self.x + y


def f(y):
foo = Foo()
return foo.run(y)


def main():
pool = mp.Pool(4)
for result in pool.map(f, range(10)):
print(result)
pool.close()
pool.join()


if __name__ == '__main__':
main()


How can I modify it so Foo is only initialized once by each worker, not every task? Basically I want the init called 4 times, not 10. I am using python 3.5

Answer

The intended way to deal with things like this is via the optional initializer and initargs arguments to the Pool() constructor. They exist precisely to give you a way to do stuff exactly once when a worker process is created. So, e.g., add:

def init():
    global foo
    foo = Foo()

and change the Pool creation to:

pool = mp.Pool(4, initializer=init)

If you needed to pass arguments to your per-process initialization function, then you'd also add an appropriate initargs=... argument.

Note: of course you should also remove the

foo = Foo()

line from f(), so that your function uses the global foo created by init().

Comments