f126ck f126ck - 17 days ago 5
Python Question

Print Unicode string containing both accented characters and emoticons

I'm reading a file with Python that contains exactly the following line

à è ì ò ù ç @ \U0001F914


where
\U0001F914
is the unicode code for an emoticon.

if interpret the string as

string=string.decode('utf-8')


I get:

à è ì ò ù ç @ \U0001F914


while if I interpret as following:

string=string.decode('unicode-escape')


I get:

à è ì ò ù ç @

Answer

Maybe it is not the best solution but first you can use encode with 'unicode-escape' instead of decode and you get

data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape')

\xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \\U0001F914

then you have to replace \\ with \ - in Python you will need \\\\ and \\

data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape').replace('\\\\', '\\')

\xe0 \xe8 \xec \xf2 \xf9 \xe7 @ \U0001F914

and then you can use your decode with 'unicode-escape'

data = 'à è ì ò ù ç @ \U0001F914'
print data.encode('unicode-escape').replace('\\\\', '\\').decode('unicode-escape')

à è ì ò ù ç @ 
Comments