Zelong Zelong - 12 days ago 7
Python Question

How to cache in IPython Notebook?

Environment:


  • Python 3

  • IPython 3.2



Every time I shut down a IPython notebook and re-open it, I have to re-run all the cells. But some cells involve intensive computation.

By contrast,
knitr
in R save the results in a cache directory by default so only new code and new settings would invoke computation.

I looked at
ipycache
but it seems to cache a cell instead of the notebook. Is there a counterpart of cache of
knitr
in IPython?

Answer

Can you give an example of what you are trying to do? When I run something in an IPython Notebook that is expensive I almost always write it to disk afterword. For example, if my data is a list of JSON object, I write it to disk as line separated JSON formatted strings:

with open('path_to_file.json', 'a') as file:
    for item in data: 
        line = json.dumps(item)
        file.write(line + '\n')
file.close()

You can then read back in the data the same way:

data = []
with open('path_to_file.json', 'a') as file:
    for line in file: 
        data_item = json.loads(line)
        data.append(data_item)
file.close()

I think this is a good practice generally speaking because it provides you a backup. You can also use pickle for the same thing. If your data is really big you can actually gzip.open to directly write to a zip file.

EDIT

To save a scikit learn model to disk use joblib.pickle.

from sklearn.cluster import KMeans

km = KMeans(n_clusters=num_clusters)
km.fit(some_data)


from sklearn.externals import joblib
# dump to pickle
joblib.dump(km, 'model.pkl')

# and reload from pickle
km = joblib.load('model.pkl')