MrKickkiller MrKickkiller - 7 months ago 20
Python Question

Trouble with encoding of .265 files. Python script to split them in NAL-units gives UnicodeDecodeError

While working on a project, i've hit a dead stop.

Whenever I try to execute the following python script with arguments

-i Bitstreams/BasketballDrive.265


https://gist.github.com/anonymous/5393d6ec4d2c7f8431e2a97fd750a76d

where the Bitstreams/BasketballDrive.265 is an encoded video file, I get a UnicodeDecodeError

Traceback (most recent call last):
File "C:/Users/Mathieu/Documents/Deel-4--Video-3/extractor.py", line 84, in <module>
main()
File "C:/Users/Mathieu/Documents/Deel-4--Video-3/extractor.py", line 79, in main
extractLayers(args['inputFile'], args['outputFile'], args['temporalLayer'])
File "C:/Users/Mathieu/Documents/Deel-4--Video-3/extractor.py", line 17, in extractLayers
gesplit = split_file(voorsplit, "0x00".encode("cp1252"))
File "C:/Users/Mathieu/Documents/Deel-4--Video-3/extractor.py", line 41, in split_file
for block in iter(lambda: fp.read(BLOCKSIZE), ''):
File "C:/Users/Mathieu/Documents/Deel-4--Video-3/extractor.py", line 41, in <lambda>
for block in iter(lambda: fp.read(BLOCKSIZE), ''):
File "C:\Users\Mathieu\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 192: character maps to <undefined>


(Error was produced without specifying the encoding on the
open(INPUTFILENAME)
)

If I include

sys.getdefaultencoding()


I get

>>> utf-8


Adding
encoding="utf-8
to the
open(INPUTFILENAME)
didn't work either.

Python version : 3.5

Windows version : W8.1

Answer

Open the files in binary mode;

open(INPUTFILENAME, 'rb')

By default, Python 3 opens files in text mode. This implies trying to make a str out of the contents when reading. This is generally not what you want to do with a binary file.