Danny Cullen Danny Cullen - 28 days ago 18
Python Question

Removing hexadecimal characters from a unicode object

I am trying to remove the hexadecimal characters

\xef\xbb\xbf
from my string however I am getting the following error.

Not quite sure how to resolve this.

>>> x = u'\xef\xbb\xbfHello'
>>> x
u'\xef\xbb\xbfHello'
>>> type(x)
<type 'unicode'>
>>> print x
Hello
>>> print x.replace('\xef\xbb\xbf', '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
>>>

Answer

You need to replace the unicode object, otherwise Python2 will to attempt to encode x with the ascii codec to search for the a str in it.

>>> x = u'\xef\xbb\xbfHello'
>>> x
u'\xef\xbb\xbfHello'
>>> print(x.replace(u'\xef\xbb\xbf',u''))
Hello

This only holds for Python2. In Python3 both versions will work.