Braden Braden - 3 years ago 201
Python Question

UnicodeDecodeError when parsing XML on mac but works on PC

When parsing a

XML
file with:

from lxml import etree

with open('cortex_full.xml', 'r') as infile:
root = etree.parse(infile)


I am getting the
UnicodeDecodeError
below. This only happens on my Mac though - if I parse the same file with the same script on my work PC, everything works fine.

File "/Users/Desktop/CPET/xml_test2.py", line 5, in <module>
root = etree.parse(infile)
File "src/lxml/lxml.etree.pyx", line 3442, in lxml.etree.parse (src/lxml/lxml.etree.c:81701)
File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:118888)
File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:119171)
File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:117959)
File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:112686)
File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107548)
File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:12152)
File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:103210)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 783: ordinal not in range(128)


This seems to be quite a common occurrence given the number of threads on here, however none of the suggested fixes seem to be working for this instance. Any ideas for getting it to work? Full
XML
file here

Answer Source

Posting an answer that worked for me for future reference. Credit goes to @Burhan Khalid for the answer.

Need to set encoding to utf-8 when opening the xml file.

with open('cortex_full.xml', 'r', encoding='utf-8') as infile:
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download