pbecker13 pbecker13 - 1 year ago 93
Python Question

Python UnicodeDecodeError

I am writing a Python program to read in a DOS tree command outputted into a text document. When I reach the 533th iteration of the loop, Eclipse gives an error:

Traceback (most recent call last):
File "E:\Peter\Documents\Eclipse Workspace\MusicManagement\InputTest.py", line 24, in <module>
input = myfile.readline()
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3551: character maps to undefined

I have read other posts, and setting the encoding to latin-1 does not resolve this issue, as it returns a
on another character, and the same with trying to use utf-8.

The following is the code:

import os
from Album import *

os.system("tree F:\\Music > tree.txt")

myfile = open('tree.txt')

albums = []
x = 0

while x < 533:
if not input: break
input = myfile.readline()
if len(input) < 14:
artist = input[4:-1]
elif input[13] != '-':
artist = input[4:-1]
albums.append(Album(artist, input[15:-1], input[8:12]))
x += 1

for x in albums:
print(x.artist + ' - ' + x.title + ' (' + str(x.year) + ')')

Answer Source

You need to figure out what encoding tree.com used; according to this post that could any of the MS-DOS codepages.

You could go through each of the MS-DOS encodings; most of those have a codec in the python standard library. I'd try cp437 and cp500 first; the latter is the MS-DOS predecessor of cp1252 I think.

Pass the encoding to open():

myfile = open('tree.txt', encoding='cp437')

You really should look into using os.walk() instead of using tree.com for this task though, it'll save you having to deal with issues like these at least.