snowleopard snowleopard - 4 months ago 58
Python Question

Using numpy.fromfile to read scattered binary data

There are different blocks in a binary that I want to read using a single call of

numpy.fromfile
. Each block has the following format:

OES=[
('EKEY','i4',1),
('FD1','f4',1),
('EX1','f4',1),
('EY1','f4',1),
('EXY1','f4',1),
('EA1','f4',1),
('EMJRP1','f4',1),
('EMNRP1','f4',1),
('EMAX1','f4',1),
('FD2','f4',1),
('EX2','f4',1),
('EY2','f4',1),
('EXY2','f4',1),
('EA2','f4',1),
('EMJRP2','f4',1),
('EMNRP2','f4',1),
('EMAX2','f4',1)]


Here is the format of the binary:

Data I want (OES format repeating n times)
------------------------
Useless Data
------------------------
Data I want (OES format repeating m times)
------------------------
etc..


I know the byte increment between the data i want and the useless data. I also know the size of each data block i want.

So far, i have accomplished my goal by seeking on the file object
f
and then calling:

nparr = np.fromfile(f,dtype=OES,count=size)


So I have a different
nparr
for each data block I want and concatenated all the
numpy
arrays into one new array.

My goal is to have a single array with all the blocks i want without concatenating (for memory purposes). That is, I want to call
nparr = np.fromfile(f,dtype=OES)
only once. Is there a way to accomplish this goal?

Answer

That is, I want to call nparr = np.fromfile(f,dtype=OES) only once. Is there a way to accomplish this goal?

No, not with a single call to fromfile().

But if you know the complete layout of the file in advance, you can preallocate the array, and then use fromfile and seek to read the OES blocks directly into the preallocated array. Suppose, for example, that you know the file positions of each OES block, and you know the number of records in each block. That is, you know:

file_positions = [position1, position2, ...]
numrecords = [n1, n2, ...]

Then you could do something like this (assuming f is the already opened file):

total = sum(numrecords)
nparr = np.empty(total, dtype=OES)
current_index = 0
for pos, n in zip(file_positions, numrecords):
    f.seek(pos)
    nparr[current_index:current_index+n] = np.fromfile(f, count=n, dtype=OES)
    current_index += n