pbecker13 pbecker13 - 1 year ago 128
Python Question

Python UnicodeDecodeError

I am writing a Python program to read in a DOS tree command outputted into a text document. When I reach the 533th iteration of the loop, Eclipse gives an error:

Traceback (most recent call last):
File "E:\Peter\Documents\Eclipse Workspace\MusicManagement\InputTest.py", line 24, in <module>
input = myfile.readline()
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3551: character maps to undefined

I have read other posts, and setting the encoding to latin-1 does not resolve this issue, as it returns a
on another character, and the same with trying to use utf-8.

The following is the code:

import os
from Album import *

os.system("tree F:\\Music > tree.txt")

myfile = open('tree.txt')

albums = []
x = 0

while x < 533:
if not input: break
input = myfile.readline()
if len(input) < 14:
artist = input[4:-1]
elif input[13] != '-':
artist = input[4:-1]
albums.append(Album(artist, input[15:-1], input[8:12]))
x += 1

for x in albums:
print(x.artist + ' - ' + x.title + ' (' + str(x.year) + ')')

Answer Source

You need to figure out what encoding tree.com used; according to this post that could any of the MS-DOS codepages.

You could go through each of the MS-DOS encodings; most of those have a codec in the python standard library. I'd try cp437 and cp500 first; the latter is the MS-DOS predecessor of cp1252 I think.

Pass the encoding to open():

myfile = open('tree.txt', encoding='cp437')

You really should look into using os.walk() instead of using tree.com for this task though, it'll save you having to deal with issues like these at least.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download