Johannes Schwaninger Johannes Schwaninger - 6 months ago 14
Python Question

Python list items encoding

Why is it, that the encoding changes in Python 2.7 when I iterate over the items of a list?

test_list = ['Hafst\xc3\xa4tter', '']

Printing the list:


gets me this output:

['Hafst\xc3\xa4tter', '']

So far, so good. But why is it, that when I iterate over the list, such as:

for item in test_list:

I get this output:


Why does the encoding change (does it?? And how can I change the encoding within the list?


The encoding isn't changing, they are just different ways of displaying a string. One shows the non-ASCII bytes as escape codes for debugging:

>>> test_list = ['Hafst\xc3\xa4tter', '']
>>> print(test_list)
['Hafst\xc3\xa4tter', '']
>>> for item in test_list:
...     print(item)

But they are equivalent:

>>> 'Hafst\xc3\xa4tter' == 'Hafstätter'

If you want to see lists displayed with the non-debugging output, you have to generate it yourself:

>>> print("['"+"', '".join(item for item in test_list) + "']")
['Hafstätter', '']

There is a reason for the debugging output:

>>> a = 'a\xcc\x88'
>>> b = '\xc3\xa4'
>>> a
>>> print a,b   # should look the same, if not it is the browser's fault :)
ä ä
>>> a==b
>>> [a,b]      # In a list you can see the difference by default.
['a\xcc\x88', '\xc3\xa4']