awulll awulll - 7 months ago 46
Python Question

Reading unicode characters from file/sqlite database and using it in Python

I have a list of variables with unicode characters, some of them for chemicals like Ozone gas: like 'Ou\2083'. All of them are stored in a sqlite database which is read in a Python code to produce O3. However, when I read I get 'Ou\2083'. The sqlite database is created using an csv file that contains the string 'O\u2083' among others. I understand that u\2083 is not being stored in sqlite database as unicode character but as 6 unicode characters (which would be u,\,2,0,8,3). Is there any way to recognize unicode characters in this context? Now my first option to solve it is to create a function to recognize set of characters and replace for unicode characters. Is there anything like this already implemented?


If you have a byte string (length 7), decode the Unicode escape.

>>> s = 'O\u2083'
>>> len(s)
>>> s
>>> print(s)
>>> u = s.decode('unicode-escape')
>>> len(u)
>>> u
>>> print(u)

Caveat: Your console/IDE used to print the character needs to use an encoding that supports the character or you'll get a UnicodeEncodeError when printing. The font must support the symbol as well.