P-M P-M - 2 months ago 32
Python Question

cPickle: UnpicklingError: invalid load key, 'A'

I have a pickle file which upon unpickling throws an

UnpicklingError: invalid load key, 'A'.
exception. The exception gets thrown regardless of whether I try to analyse it on the Ubuntu 14.04 machine on which the file was generated or on my Windows machine. It contains 26 data points and the exception gets thrown after data point 11. I suspect I must have somehow accidentally edited the file though I don't know when or how. I know there are several other discussions on this sort of error but so far I haven't found a post yet telling me if and how I could go about recovering the values after the faulty entry (I suspect one of the values is just irretrievably lost). Is there any way I could skip it and carry on unpickling the next one? Can one e.g. unpickle in the reverse direction, i.e. last element first? Then I could work backwards till I hit the faulty entry and thus get the other values. (I could regenerate the data but it would take a day or two so I would rather avoid having to do that if I can.)

This is the code for pickling:

with open('hist_vs_years2.pkl', 'ab') as hist_pkl:
pickle.dump(hist, hist_pkl, -1)


And this is the code for unpickling:

hist_vs_samples2 = []
more_values = True

with open('hist_vs_years2.pkl', 'rb') as hist_vs_samples_pkl:
while more_values == True:
try:
hist_vs_samples2.append(pickle.load(hist_vs_samples_pkl))
except EOFError:
more_values = False


I should add that I am using cPickle. If I try to unpickle using pickle I get the following error:

File "C:\Anaconda2\lib\pickle.py", line 1384, in load
return Unpickler(file).load()

File "C:\Anaconda2\lib\pickle.py", line 864, in load
dispatch[key](self)

KeyError: 'A'

Answer

When storing multiple objects (by repeated dump, not from containers) Pickle will store objects sequentially in pickle files, so if an object is broken it can be removed without corrupting the others.

In principle, the pickle format is pseudo-documented in pickle.py. For most cases, the opcodes at the beginning of the module are sufficient to piece together what is happening. Basically, pickle files are an instruction on how to build objects.

How readable a pickle file is depends on its pickle format - 0 is doable, everything above is difficult. Whether you can fix or must delete depends entirely on this. What's consistent is that each individual pickle ends with a dot (.). For example, b'Va\np0\n.' and b'\x80\x04\x95\x05\x00\x00\x00\x00\x00\x00\x00\x8c\x01a\x94.' both are the character '"a"', but in protocol 0 and 4.

The simplest form of recovery is to count the number of objects you can load:

with open('/my/pickle.pkl', 'rb') as pkl_source:
    idx = 1
    while True:
        pickle.load(pkl_source)
        print(idx)
        idx += 1

Then open the pickle file, skip as many objects and remove everything up to the next ..