user3798292 user3798292 - 3 months ago 19
Python Question

Saving functions using shelve

I'm trying to use the shelve python module to save my session output and reload it later, but I have found that if I have defined functions then I get an error in the reloading stage. Is there a problem with the way I am doing it? I based my code on an answer at How can I save all the variables in the current python session? .

Here's some simple code that reproduces the error:

def test_fn(): #simple test function
return

import shelve
my_shelf = shelve.open('test_shelve','n')

for key in globals().keys():
try:
my_shelf[key] = globals()[key]
except: #__builtins__, my_shelf, and imported modules cannot be shelved.
pass

my_shelf.close()


Then if I exit I can do

ls -lh test_shelve*
-rw-r--r-- 1 user group 22K Aug 24 11:16 test_shelve.bak
-rw-r--r-- 1 user group 476K Aug 24 11:16 test_shelve.dat
-rw-r--r-- 1 user group 22K Aug 24 11:16 test_shelve.dir


In general, in a new IPython session I want to be able to do something like:

import shelve
my_shelf = shelve.open('test_shelve')
for key in my_shelf:
globals()[key]=my_shelf[key]


This produces an error for key 'test_fn'. Here is some code to demonstrate the error:

print my_shelf['test_fn']
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-deb481380237> in <module>()
----> 1 print my_shelf['test_fn']

/home/user/anaconda2/envs/main/lib/python2.7/shelve.pyc in __getitem__(self, key)
120 except KeyError:
121 f = StringIO(self.dict[key])
--> 122 value = Unpickler(f).load()
123 if self.writeback:
124 self.cache[key] = value

AttributeError: 'module' object has no attribute 'test_fn'


Of course, one solution would be to exclude functions in the saving stage, but from what I have read it should be possible to restore them with this method, and so I wondered if I am doing things wrongly.

Answer

You can't use shelve (or pickle, the actual protocol used by shelve) to store executable code, no.

What is stored is a reference to the function (just the location where the function can be imported from again). Code is not data, only the fact that you referenced a function is data here. Pickle expects to be able to load the same module and function again when you load the stored information.

The same would apply to classes; if you pickle a reference to a class, or pickle an instance of a class, then only the information to import the class again is stored (to re-create the reference or instance).

All this is done because you already have a persisted and loadable representation of that function or class: the module that defines them. There is no need to store another copy.

This is documented explicitly in the What can be pickled and unpickled? section:

Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.

To go into some more detail for your specific example: The main script that Python executes is called the __main__ module, and you shelved the __main__.test_fn function. What is stored then is simply a marker that signals you referenced a global and the import location, so something close to GLOBAL and __main__ plus test_fn are stored. When loading the shelved data again, upon seeing the GLOBAL marker, the pickle module tries to load the name test_fn from the __main__ module. Since your second script is again loaded as __main__ but doesn't have a test_fn global, loading the reference fails.