Brandon Kuczenski Brandon Kuczenski - 4 months ago 16
Python Question

python: Dowloading and caching XML files - how to handle encoding declaration?

from urllib.request import urlopen
from lxml import objectify

I am trying to write a program that will download XML files into a cache and then open them using
. If I download the files using
then I can read them in using
just fine:

r = urlopen(my_url)
o = objectify.fromstring(

However, if I download them and write them to a file, I end up with an encoding declaration at the top of the file that
doesn't like. To wit:

# download the file
my_file = 'foo.xml'
r = urlopen(my_url)

# save locally
with open(my_file, 'wb') as fp:

# open saved copy
with open(my_file, 'r') as fp:
o1 = objectify.fromstring(

results in
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

If I use
then that works fine- soo-- I could go through and change all the client code to use
instead, but I feel like that is not the right approach. I have other XML files stored locally for which
works just fine-- based on a cursory review they appear to have

I just don't know what is the right resolution here- should I change the encoding when I save the file? should I strip the encoding declaration? should I fill my code with
try.. except ValueError
clauses? please advise.


The file needs to be opened in binary mode rather than text mode.

open(my_file, 'rb') # b stands for binary

as suggested by the exception: ... Please use bytes input ...