xralf xralf - 5 months ago 7
Python Question

Usage of unicode() and encode() functions in Python

I have a problem with encoding of the path variable and inserting it to the SQLite database. I tried to solve it with encode("utf-8") function which didn't help. Then I used unicode() function which gives me type unicode.

print type(path) # <type 'unicode'>
path = path.replace("one", "two") # <type 'str'>
path = path.encode("utf-8") # <type 'str'> strange
path = unicode(path) # <type 'unicode'>


Finally I gained unicode type, but I still have the same error which was present when the type of the path variable was str


sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless
you use a text_factory that can interpret 8-bit bytestrings (like
text_factory = str). It is highly recommended that you instead just
switch your application to Unicode strings.


Could you help me solve this error and explain the correct usage of
encode("utf-8")
and
unicode()
functions? I'm often fighting with it.

EDIT:

This execute() statement raised the error:

cur.execute("update docs set path = :fullFilePath where path = :path", locals())


I forgot to change the encoding of fullFilePath variable which suffers with the same problem, but I'm quite confused now. Should I use only unicode() or encode("utf-8") or both?

I can't use

fullFilePath = unicode(fullFilePath.encode("utf-8"))


because it raises this error:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position
32: ordinal not in range(128)


Python version is 2.7.2

Answer

You are using encode("utf-8") incorrectly. Python byte strings (str type) have an encoding, Unicode does not. You can convert a Unicode string to a Python byte string using uni.encode(encoding), and you can convert a byte string to a Unicode string using s.decode(encoding) (or equivalently, unicode(s, encoding)).

If fullFilePath and path are currently a str type, you should figure out how they are encoded. For example, if the current encoding is utf-8, you would use:

path = path.decode('utf-8')
fullFilePath = fullFilePath.decode('utf-8')

If this doesn't fix it, the actual issue may be that you are not using a Unicode string in your execute() call, try changing it to the following:

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())