Valeriov1992v Valeriov1992v - 8 months ago 51
Python Question

Python 2.7 - Unable to correctly decode email subject-header line

I'm using Python 2.7, and I am trying to properly decode the subject header line of an email. The source of the email is:

Subject: =?UTF-8?B?VGkgw6ggcGlhY2l1dGEgbGEgZGVtbz8gU2NvcHJpIGFsdHJlIG4=?=

I use the function decode_header(header) from the email.header library, and the result is:

[('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')]

The 'xc3\xa8' part should match the 'è' character, but it is not correctly decoded/showed.
Another example:

Subject: =?iso-8859-1?Q?niccol=F2_cop?= =?iso-8859-1?Q?ernico?=


[('niccol\xf2 copernico', 'iso-8859-1')]

How can I obtain the correct string?


You are getting the correct string. It's just encoded (using UTF-8 in the first case, and iso-8895-1 in the second); you need to decode it to get the actual unicode string.

For example:

>>> print unicode('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')
Ti è piaciuta la demo? Scopri altre n


>>> print unicode('niccol\xf2 copernico', 'iso-8859-1')
niccolò copernico

That's why you get back both the header data and the encoding.