Muaz Muaz - 4 years ago 319
Python Question

Error in reading HDF5 using h5py

I have saved my dataset in this form as mentioned in the following image (HDF5 format). So I have different groups i.e. 4, 2, 40 etc. and for each group I have 2 datasets

Annotation
and
Features
. I have save them successfully using code but I am unable to load them back.

Strange thing is the error occurs only when I try to read
Annotation
. And reading works fine when I try to read
Features
.

I am using the following code:

dataSet = np.array([])
annotation = np.array([])
hdf5Object = readHDF5File('abc.hdf5','r')
w = 2
myGroup = hdf5Object[str(w)]

dataSet = np.array(myGroup['Features'])
annotation = np.array(myGroup['Annotation'])


Please enlighten me here as I am struggling a lot for this for a while now. Thanks.

View of HDF5 generated using ViTables 2.1

EDIT 1

I am getting the following error when I read
Annotation


Traceback (most recent call last):
File "xyz.py", line 76, in getAllData
annotation = np.array(myGroup['Annotation'])
File "/usr/lib/python2.7/dist-packages/h5py/_hl/group.py", line 153, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5o.pyx", line 173, in h5py.h5o.open (h5py/h5o.c:3403)
KeyError: "unable to open object (Symbol table: Can't open object)"


EDIT 2

So the hdf5 file was formed in 2 steps, in 1st step
Features
were calculated as follows:

features = <numpy array of thousand rows and 100 columns contains only floating numbers>
w = 2
f = h5py.File('abc.hdf5', 'a')
myGroup = f[str(w)]
myGroup.create_dataset('Features', data=features)


For different
w
file was appended and features were calculated at different times.

For annotation, same kind of procedure is used.
Annotation
contains only floating points as well.

EDIT 3

In the following image is content of data in
Annotation
and
Features
of one
w
. Left window is
Annotation
and right one is
Features
.

enter image description here

Answer Source

I just figured out that the way I was trying to access dataset was using string and somehow while saving dataset name it was saved under unicode or utf-8. So when I convert my dataset name to utf-8 it works fine.

How I figured out its datatype

    myGroup = hdf5Object[str(w)]
    childsIter = myGroup.iterkeys()
    for child in childsIter:
        print type(child)

This gave me the clue that the data type of my key of dataset is unicode and not just string. So I converted my string to unicode as follows:

key = unicode('Annotation', "utf-8")
dS = np.array(myGroup[key])

or

myGroup = hdf5Object[str(w)]
childsIter = myGroup.iterkeys()
for child in childsIter:
    dS = np.array(myGroup[child])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download