Ron Cohen Ron Cohen - 2 months ago 37
Python Question

Python pickle calls cPickle?

I am new to Python. I am adapting someone else's code from Python 2.X to 3.5. The code loads a file via cPickle. I changed all "cPickle" occurrences to "pickle" as I understand pickle superceded cPickle in 3.5. I get this execution error:

NameError: name 'cPickle' is not defined


Pertinent code:

import pickle
import gzip
...
def load_data():
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, fix_imports=True)
f.close()
return (training_data, validation_data, test_data)


The error occurs in the
pickle.load
line when
load_data()
is called by another function. However, a) neither
cPickle
or
cpickle
no longer appear in any source files anywhere in the project (searched globally) and b) the error does not occur if I run the lines within
load_data()
individually in the Python shell (however, I do get another data format error). Is
pickle
calling
cPickle
, and if so how do I stop it?

Shell:
Python 3.5.0 |Anaconda 2.4.0 (x86_64)| (default, Oct 20 2015, 14:39:26)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin

IDE: IntelliJ 15.0.1, Python 3.5.0, anaconda

Unclear how to proceed. Any help appreciated. Thanks.

Answer

It's looking like the pickled data that you're trying to load was generated by a version of the program that was running on Python 2.7. The data is what contains the references to cPickle.

The problem is that Pickle, as a serialization format, assumes that your standard library (and to a lesser extent your code) won't change layout between serialization and deserialization. Which it did -- a lot -- between Python 2 and 3. And when that happens, Pickle has no path for migration.

Do you have access to the program that generated mnist.pkl.gz? If so, port it to Python 3 and re-run it to regenerate a Python 3-compatible version of the file.

If not, you'll have to write a Python 2 program that loads that file and exports it to a format that can be loaded from Python 3 (depending on the shape of your data, JSON and CSV are popular choices), then write a Python 3 program that loads that format then dumps it as Python 3 pickle. You can then load that Pickle file from your original program.

Of course, what you should really do is stop at the point where you have ability to load the exported format from Python 3 -- and use the aforementioned format as your actual, long-term storage format.

Using Pickle for anything other than short-term serialization between trusted programs (loading Pickle is equivalent to running arbitrary code in your Python VM) is something you should actively avoid, among other things because of the exact case you find yourself in.

Comments