Matthew Gertner Matthew Gertner - 7 months ago 44
Python Question

How do I post non-ASCII characters using httplib when content-type is "application/xml"

I've implemented a Pivotal Tracker API module in Python 2.7. The Pivotal Tracker API expects POST data to be an XML document and "application/xml" to be the content type.

My code uses urlib/httplib to post the document as shown:

request = urllib2.Request(self.url, xml_request.toxml('utf-8') if xml_request else None, self.headers)
obj = parse_xml(self.opener.open(request))


This yields an exception when the XML text contains non-ASCII characters:

File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
msg += message_body
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 89: ordinal not in range(128)


As near as I can see, httplib._send_output is creating an ASCII string for the message payload, presumably because it expects the data to be URL encoded (application/x-www-form-urlencoded). It works fine with application/xml as long as only ASCII characters are used.

Is there a straightforward way to post application/xml data containing non-ASCII characters or am I going to have to jump through hoops (e.g. using Twistd and a custom producer for the POST payload)?

Answer

You're mixing Unicode ans bytestrings.

>>> msg = u'abc' # Unicode string
>>> message_body = '\xc5' # bytestring
>>> msg += message_body
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal \
not in range(128)

To fix it make sure that self.headers content is properly encoded i.e., all keys, values in the headers should be bytestrings:

self.headers = dict((k.encode('ascii') if isinstance(k, unicode) else k,
                     v.encode('ascii') if isinstance(v, unicode) else v)
                    for k,v in self.headers.items())

Note: character encoding of the headers has nothing to do with character encoding of a body.

The same goes for self.url