CodeNoob CodeNoob - 1 month ago 14
Python Question

Python multiprocessing pickling error

I am sorry that I can't reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular python, things work out well.

I looked up some previous notes on this problem. They were all caused by using pool to call function defined within a class function. But this is not the case for me.

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed


I would appreciate any help.

UPDATE: The function I pickle is defined at the top level of the module. Though it calls a function that contains a nested function. i.e, f() calls g() calls h() which has a nested function i(), and I am calling pool.apply_async(f). f(), g(), h() are all defined at the top level. I tried simpler example with this pattern and it works though.

Answer

Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.

This piece of code:

import multiprocessing as mp

class Foo():
    @staticmethod
    def work(self):
        pass

pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()

yields an error almost identical to the one you posted:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

The problem is that the pool methods all use a queue.Queue to pass tasks to the worker processes. Everything that goes through the queue.Queue must be pickable, and foo.work is not picklable since it is not defined at the top level of the module.

It can be fixed by defining a function at the top level, which calls foo.work():

def work(foo):
    foo.work()

pool.apply_async(work,args=(foo,))

Notice that foo is pickable, since Foo is defined at the top level and foo.__dict__ is picklable.