Ray Osborn Ray Osborn - 6 months ago 17
Python Question

How do I override the str function without raising a UnicodeEncodeError?

I am puzzled that defining

__str__
for a class seems to have no effect on using the
str
function on a class instance. For example, I read in the Django documentation that:


The
print
statement and the
str
built-in call
__str__()
to determine the human-readable representation of an object.


But that doesn't appear to be true. Here's an example from a module where
text
is always assumed to be unicode:

import six

class Test(object):

def __init__(self, text):
self._text = text

def __str__(self):
if six.PY3:
return str(self._text)
else:
return unicode(self._text)

def __unicode__(self):
if six.PY3:
return str(self._text)
else:
return unicode(self._text)


In Python 2, it gives the following behavior:

>>> a=Test(u'café')
>>> print a.__str__()
café
>>> print a # same error with str(a)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-63-202e444820fd> in <module>()
----> 1 str(a)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)


Is there a way to overload the
str
function?

Answer

For Python 2, you are returning the wrong type from the __str__ method. You are returning unicode, while you must return str:

def __str__(self):
    if six.PY3:
        return str(self._text)
    else:
        return self._text.encode('utf8')

Because self._text is not already of type str, you'll need to encode it. Because you returned Unicode instead, Python is forced to encode it first, but the default ASCII encoding can't handle the non-ASCII é character.

Printing the object results in the right output only because my terminal is configured to handle UTF-8:

>>> a = Test(u'café')
>>> str(a)
'caf\xc3\xa9'
>>> print a
café
>>> unicode(a)
u'caf\xe9'

Note that there is no __unicode__ method in Python 3; your if six.PY3 in that method is entirely redundant. The following would work too:

class Test(object):
    def __init__(self, text):
        self._text = text

    def __str__(self):
        if six.PY3:
            return self._text
        else:
            return self._text.encode('utf8')

    def __unicode__(self):
        return self._text

However, if you are using the six library, you'd be far better of using the @six.python_2_unicode_compatible decorator, and only define a Python 3 version for the __str__ method:

@six.python_2_unicode_compatible
class Test(object):
    def __init__(self, text):
        self._text = text

    def __str__(self):
        return self._text

where it is assumed text is always Unicode. If you are working with Django, then you can get the same decorator from the django.utils.encoding module.

Comments