Manish Kumar Manish Kumar - 10 days ago 5
Python Question

loading dataset in jupyter notebook python

I running the below code in jupyter notebook python:

# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


and then the below instructions:

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print ('Training data shape: ', X_train.shape)
print ('Training labels shape: ', y_train.shape)
print ('Test data shape: ', X_test.shape)
print ('Test labels shape: ', y_test.shape)


By running the 2nd portion , I ma getting the below error:

---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-5-9506c06e646a> in <module>()
1 # Load the raw CIFAR-10 data.
2 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
----> 3 X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
4
5 # As a sanity check, we print out the size of the training and test data.

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR10(ROOT)
20 for b in range(1,6):
21 f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
---> 22 X, Y = load_CIFAR_batch(f)
23 xs.append(X)
24 ys.append(Y)

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR_batch(filename)
7 """ load single batch of cifar """
8 with open(filename, 'rb') as f:
----> 9 datadict = pickle.load(f)
10 X = datadict['data']
11 Y = datadict['labels']

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)


How can I resolve this error? I am using Annaconda3 to run this code. It seems the above code has been writtern in Annaonda2 version. Any suugestion to fix these errors?

Just for more details:

I am trying to solve the assignment from the link: http://cs231n.github.io/assignments2016/assignment1/

Edit:

Adding the data_utils.py containing definition of load_CIFAR

import _pickle as pickle
import numpy as np
import os
from scipy.misc import imread

def load_CIFAR_batch(filename):
""" load single batch of cifar """
with open(filename, 'rb') as f:
datadict = pickle.load(f)
X = datadict['data']
Y = datadict['labels']
X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
Y = np.array(Y)
return X, Y

def load_CIFAR10(ROOT):
""" load all of cifar """
xs = []
ys = []
for b in range(1,6):
f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
X, Y = load_CIFAR_batch(f)
xs.append(X)
ys.append(Y)
Xtr = np.concatenate(xs)
Ytr = np.concatenate(ys)
del X, Y
Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
return Xtr, Ytr, Xte, Yte

Answer

The pickle file you are loading has most likely been generated with python 2.

Since there are fundamental differences in how pickle works in Python2 and Python3, you may attempt loading the file using latin-1 encoding, hich assumes direct mapping of 0-255 to chars.

This method requires some sanity check, since it's not guaranteed to produce coherent data.