Seyren Windsor Seyren Windsor - 1 month ago 10
Python Question

Read Null terminated string in python

I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.

This is the code:

def readString(f):
str = ''
while True:
char = readChar(f)
str = str.join(char)
if (hex(ord(char))) == '0x0':
break
return str

def readChar(f):
char = unpack('c',f.read(1))[0]
return char


Now this is giving me this error:

TypeError: sequence item 0: expected str instance, int found


I'm also trying the following:

char = unpack('c',f.read(1)).decode("ascii")


But it throws me:
AttributeError: 'tuple' object has no attribute 'decode'

I don't even know how to read the chars and add it to the string, Is there any proper way to do this?

Answer

(edit version 2, added extra way at the end)

Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.

In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString

def readString(myfile):
    chars = []
    while True:
        c = myfile.read(1)
        if c == chr(0):
            return "".join(chars)
        chars.append(c)

just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...

Now lets test it with your sample file

for instance lets try to read "Sword_Wea_Dummy" from it

with open("sword.blendscn","rb") as archi:
    #lets simulate that some prior processing was made by 
    #moving the pointer of the file
    archi.seek(6) 
    string=readString(archi)
    print "string repr:", repr(string)
    print "string:", string
    print ""
    #and the rest of the file is there waiting to be processed
    print "rest of the file: ", repr(archi.read())

and this is the output

string repr: 'Sword_Wea_Dummy'
string: Sword_Wea_Dummy

rest of the file:  '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf@\x0e\xf3\xb1@ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'

other tests

>>> with open("sword.blendscn","rb") as archi:
        print readString(archi)
        print readString(archi)
        print readString(archi)


sword
Sword_Wea_Dummy
ÍÌÌ=p=Š4:¦6¿JÆ=
>>> with open("sword.blendscn","rb") as archi:
        print repr(readString(archi))
        print repr(readString(archi))
        print repr(readString(archi))


'sword'
'Sword_Wea_Dummy'
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
>>> 

Now that I think about it, you mention that the data portion is of fixed size, if that is true for all file an the structure on all of them is as follow

[unknow size data][know size data]

then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow

import os

def getDataPair(filename,dataSize):
    size = os.path.getsize(filename)
    with open(filename, "rb") as archi:
        unknow = archi.read(size-dataSize)
        know   = archi.read()
        return unknow, know

and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)

>>> strins_data, data = getDataPair("sword.blendscn", 80)
>>> string_data, data = getDataPair("sword.blendscn", 80)
>>> string_data
'sword\x00Sword_Wea_Dummy\x00'
>>> data
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf@\x0e\xf3\xb1@ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
>>> string_data.split(chr(0))
['sword', 'Sword_Wea_Dummy', '']
>>>          

Now you can pass the rest of the file contained in data to the appropriated function to be processed