user2560609 user2560609 - 4 months ago 43
Python Question

How to convert a unicode list of tuples into utf-8 with python

My function returns a tuple
which is then assigned to a variable x and appended to a list.

x = (u'string1', u'string2', u'string3', u'string4')

The function is called multiple times and final list consists of 20 tuples.

The strings within the tuple are in unicode and I would like to convert them to utf-8.

Some of the strings include also non-ASCII characters like ö, ä, etc.

Is there a way to convert them all in one step?


Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
>>> l[0]
>>> print l[0]

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.