I have a list of variables with unicode characters, some of them for chemicals like Ozone gas: like 'Ou\2083'. All of them are stored in a sqlite database which is read in a Python code to produce O3. However, when I read I get 'Ou\2083'. The sqlite database is created using an csv file that contains the string 'O\u2083' among others. I understand that u\2083 is not being stored in sqlite database as unicode character but as 6 unicode characters (which would be u,\,2,0,8,3). Is there any way to recognize unicode characters in this context? Now my first option to solve it is to create a function to recognize set of characters and replace for unicode characters. Is there anything like this already implemented?
If you have a byte string (length 7), decode the Unicode escape.
>>> s = 'O\u2083' >>> len(s) 7 >>> s 'O\\u2083' >>> print(s) O\u2083 >>> u = s.decode('unicode-escape') >>> len(u) 2 >>> u u'O\u2083' >>> print(u) O₃
Caveat: Your console/IDE used to print the character needs to use an encoding that supports the character or you'll get a
UnicodeEncodeError when printing. The font must support the symbol as well.