matiasg matiasg - 1 year ago 119
Python Question

how to convert repr into encoded string

I have this

(coming from a file I can't fix):

In [131]: s
Out[131]: '\\xce\\xb8Oph'

This is close to the repr of a string encoded in utf8:

In [132]: repr('θOph'.encode('utf8'))
Out[132]: "b'\\xce\\xb8Oph'"

I need the original encoded string. I can do it with

In [133]: eval("b'{}'".format(s)).decode('utf8')
Out[133]: 'θOph'

But I would be ... sad? if there were no simpler option to get it. Is there a better way?

Answer Source

Your solution is OK, the only thing is that eval is dangerous when used in arbitrary inputs. Instead, use ast.literal_eval, it is safe:

>>> s = '\\xce\\xb8Oph'
>>> from ast import literal_eval
>>> literal_eval("b'{}'".format(s)).decode('utf8')

With eval you are subject to:

>>> eval("b'{}'".format("1' and print('rm -rf /') or b'u r owned")).decode('utf8')
rm -rf /
'u r owned'

Since ast.literal_eval is the opposite of repr for literals, it is what you are looking for.


If you have a file with escaped unicode, you may want to open it with the unicode_escape encoding as suggested in the answer by Ginger++. I will keep my answer because the question was "how to convert repr into encoded string", not "how to decode file with escaped unicode".

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download