E.Tarrent E.Tarrent - 1 month ago 7
Python Question

A quest for html.unescape("&nbsp")

This is my code:

import os
import html

a = html.unescape("home - study")
b = "test"
print(a)
s = (a, b)
print(s)


And this is my result:

home - study
('home\xa0-\xa0study', 'test')


Why does the result print like this?

Answer

By default, printing sequences like tuples, lists and others uses object.__str__. That delegates control to the respective repr for the object (tuple.__repr__ here) which then proceeds to call the repr of its respective members.

Calling the repr for a string with escape codes (such as \xa0) will, in effect, not escape them:

print(repr(a))
'home\xa0-\xa0study'

To further verify, try print(s[0]). By providing the str object in position 0 directly, python will invoke its __str__ and escape the hex correctly.

Comments