amelies amelies - 1 year ago 237
Python Question

python struct.pack and write vs matlab fwrite

I am trying to port this bit of matlab code to python

matlab

function write_file(im,name)
fp = fopen(name,'wb');

M = size(im);

fwrite(fp,[M(1) M(2) M(3)],'int');
fwrite(fp,im(:),'float');

fclose(fp);


where
im
is a 3D matrix. As far as I understand, the function first writes a binary file with a header row containing the matrix size. The header is made of 3 integers. Then, the
im
is written as a single column of floats. In matlab this takes few seconds for a file of 150MB.

python

import struct
import numpy as np
def write_image(im, file_name):

with open(file_name, 'wb') as f:
l = im.shape[0]*im.shape[1]*im.shape[2]

header = np.array([im.shape[0], im.shape[1], im.shape[2]])
header_bin = struct.pack("I"*3, *header)
f.write(header_bin)

im_bin = struct.pack("f"*l,*np.reshape(im, (l,1), order='F'))
f.write(im_bin)
f.close()


where
im
is a numpy array. This code works well as I compared with the binary returned by matlab and they are the same. However, for the 150MB file, it takes several seconds and tends to drain all the memory (in the image linked I stopped the execution to avoid it, but you can see how it builds up!).

see memory usage

This does not make sense to me as I am running the function on a 15GB of RAM PC. How come a 150MB file processing requires so much memory?

I'd happy to use a different method, as far as it is possible to have two formats for the header and the data column.

Answer Source

There is no need to use struct to save your array. numpy.ndarray has a convenience method for saving itself in binary mode: ndarray.tofile. The following should be much more efficient than creating a gigantic string with the same number of elements as your array:

def write_image(im, file_name):
    with open(file_name, 'wb') as f:
        np.array(im.shape).tofile(f)
        im.T.tofile(f)

tofile always saves in row-major C order, while MATLAB uses column-major Fortran order. The simplest way to get around this is to save the transpose of the array. In general, ndarray.T should create a view (wrapper object pointing to the same underlying data) instead of a copy, so your memory usage should not increase noticeably from this operation.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download