YXD YXD - 3 months ago 19
Python Question

Understanding an issue with the namedtuple typename and pickle in Python

Earlier today I was having trouble trying to pickle a namedtuple instance. As a sanity check, I tried running some code that was posted in another answer. Here it is, simplified a little more:

from collections import namedtuple
import pickle

P = namedtuple("P", "one two three four")

def pickle_test():
abe = P("abraham", "lincoln", "vampire", "hunter")
f = open('abe.pickle', 'w')
pickle.dump(abe, f)
f.close()

pickle_test()


I then changed two lines of this to use my named tuple:

from collections import namedtuple
import pickle

P = namedtuple("my_typename", "A B C")

def pickle_test():
abe = P("ONE", "TWO", "THREE")
f = open('abe.pickle', 'w')
pickle.dump(abe, f)
f.close()

pickle_test()


However this gave me the error

File "/path/to/anaconda/lib/python2.7/pickle.py", line 748, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <class '__main__.my_typename'>: it's not found as __main__.my_typename


i.e. the Pickle module is looking for
my_typename
. I changed the line
P = namedtuple("my_typename", "A B C")
to
P = namedtuple("P", "A B C")
and it worked.

I looked at the source of
namedtuple.py
and at the end we have something that looks relevant, but I don't fully understand what is happening:

# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in enviroments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython).
try:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass

return result


So my question is what exactly is going on? Why does the
typename
argument need to match the name of the factory for this to work?

Answer

In the section titled What can be pickled and unpickled? of the Python documentation it indicates that only "classes that are defined at the top level of a module" can be pickled. However namedtuple() is a factory function which is effectively defining a class my_typename(tuple) in your second example — but it's not assigning the manufactured type to a variable named my_typename at the top level of the module.

This is because pickle saves only the “fully qualified” name of such things, not their code, and they must be importable from the module they are using that name in order to be able to unpickled later (hence the requirement that the module must contain the named object at the top level).

This can be illustrated by seeing one workaround for the problem — which would be to change one line of the code so that the type named my_typename is defined at the top level:

P = my_typename = namedtuple("my_typename", "A B C")

Alternatively, you could just give the namedtuple the name "P" instead of "my_typename":

P = namedtuple("P", "A B C")

As for what that namedtuple.py source code you were looking at does: It's trying to determine the name of module the caller (the creator of the namedtuple) is in because the author knows that pickle might try to use it to import the definition to do unpickling and that folks commonly assign the result to variable with the same name that they passed to the factory function (but you didn't in the second example).

Comments