offeltoffel offeltoffel - 2 months ago 16
Python Question

Reading fortran binary (streaming access) with np.fromfile or open & struct

The following Fortran code:

INTEGER*2 :: i, Array_A(32)
Array_A(:) = (/ (i, i=0, 31) /)

OPEN (unit=11, file = 'binary2.dat', form='unformatted', access='stream')
Do i=1,32
WRITE(11) Array_A(i)
End Do
CLOSE (11)


Produces streaming binary output with numbers from 0 to 31 in integer 16bit. Each record is taking up 2 bytes, so they are written at byte 1, 3, 5, 7 and so on. The access='stream' suppresses the standard header of Fortran for each record (I need to do that to keep the files as tiny as possible).

Looking at it with a Hex-Editor, I get:

00 00 01 00 02 00 03 00 04 00 05 00 06 00 07 00
08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00
10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00
18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 1F 00


which is completely fine (despite the fact that the second byte is never used, because decimals are too low in my example).

Now I need to import these binary files into Python 2.7, but I can't. I tried many different routines, but I always fail in doing so.

1. attempt: "np.fromfile"

with open("binary2.dat", 'r') as f:
content = np.fromfile(f, dtype=np.int16)


returns

[ 0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 0 0 26104 1242 0 0]


2. attempt: "struct"

import struct
with open("binary2.dat", 'r') as f:
content = f.readlines()
struct.unpack('h' * 32, content)


delivers

struct.error: unpack requires a string argument of length 64


because

print content
['\x00\x00\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00\x07\x00\x08\x00\t\x00\n', '\x00\x0b\x00\x0c\x00\r\x00\x0e\x00\x0f\x00\x10\x00\x11\x00\x12\x00\x13\x00\x14\x00\x15\x00\x16\x00\x17\x00\x18\x00\x19\x00']


(note the delimiter, the t and the n which shouldn't be there according to what Fortran's "streaming" access does)

3. attempt: "FortranFile"

f = FortranFile("D:/Fortran/Sandbox/binary2.dat", 'r')
print(f.read_ints(dtype=np.int16))


With the error:

TypeError: only length-1 arrays can be converted to Python scalars


(remember how it detected a delimiter in the middle of the file, but it would also crash for shorter files without line break (e.g. decimals from 0 to 8))

Some additional thoughts:

Python seems to have troubles with reading parts of the binary file. For
np.fromfile
it reads
Hex 19
(dec: 25), but crashes for
Hex 1A
(dec: 26). It seems to be confused with the letters, although 0A, 0B ... work just fine.

For attempt 2 the
content
-result is weird. Decimals 0 to 8 work fine, but then there is this strange
\t\x00\n
thing. What is it with
hex 09
then?

I've been spending hours trying to find the logic, but I'm stuck and really need some help. Any ideas?

Answer

The problem is in open file mode. Default it is 'text'. Change this mode to binary:

with open("binary2.dat", 'rb') as f:
    content = np.fromfile(f, dtype=np.int16)

and all the numbers will be readed successfull. See Dive in to Python chapter Binary Files for more details.

Comments