Sidney Sidney - 12 days ago 8
Python Question

UnicodeDecodeError on Python 3.5 when doing SQL query

I was using Python 3.5 + pyodbc (on windows 10) to do apply some SQL queries on a server, and everything was doing fine like this:

In [1]: import pyodbc
...:
In [2]: server = 'tcp:192.168.100.76'
...: database = 'database'
...: username = 'username'
...: password = 'password'
...:
In [3]: cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+use
...: rname+';PWD='+ password)
...: cursor = cnxn.cursor()

In [29]: sqlQuery = "SELECT TOP 50 * " + \
...: "FROM dbo.DS10_CIBBASE " + \
...: "LEFT JOIN dbo.DS10_CIBCITY AS O_CITY ON dbo.DS10_CIBBASE.O_CITY=O_CITY.CITYCODE " + \
...: "LEFT JOIN dbo.DS10_CIBCITY AS B_CITY ON dbo.DS10_CIBBASE.B_CITY=B_CITY.CITYCODE " + \
...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD1 ON dbo.DS10_CIBBASE.METHOD1=METHOD1.METHCODE " + \
...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD2 ON dbo.DS10_CIBBASE.METHOD2=METHOD2.METHCODE " + \
...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD3 ON dbo.DS10_CIBBASE.METHOD3=METHOD3.METHCODE " + \
...: "LEFT JOIN dbo.DS10_CIBPERS ON dbo.DS10_CIBBASE.BASENO=dbo.DS10_CIBPERS.BASENO " + \
...: "LEFT JOIN dbo.DS10_CIBFEAT ON dbo.DS10_CIBPERS.FEATURE=dbo.DS10_CIBFEAT.FEATCODE " + \
...: "LEFT JOIN dbo.DS10_CIBOCCU ON dbo.DS10_CIBPERS.OCCUPATION= dbo.DS10_CIBOCCU.OCCUCODE " + \
...: "LEFT JOIN dbo.DS10_CIBEDUC ON dbo.DS10_CIBPERS.EDU=dbo.DS10_CIBEDUC.EDUCCODE " + \
...: "LEFT JOIN dbo.DS10_CIBCITY AS L_CITY ON dbo.DS10_CIBPERS.CITY=L_CITY.CITYCODE ;"

In [30]: cursor.execute(sqlQuery)
Out[30]: <pyodbc.Cursor at 0x262625e5750>


So far so good, and then I changed the query a little bit:

In [31]: sqlQuery0 = "SELECT TOP 50 * " + \
...: "FROM dbo.LAB1_EXP3_CARNO1 ; "

In [32]: cursor.execute(sqlQuery0)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-32-769696011e11> in <module>()
----> 1 cursor.execute(sqlQuery0)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte


I could not figure why the previous query was fine, while the later encountered the UnicodeDecodeError, since all the queries were written with ASCII characters.. So what did I do wrong, and how could I get rid of this error?

cco cco
Answer

You're getting this because your data is not UTF-8 encoded (\U00A8 is "spacing diaeresis", and \xA8 is the same in Latin-1, but \xA8 is ลก in Latin-15. The error is not because the query has non-ascii characters (obviously, it doesn't), but because the result is returning a string of 8-bit characters that aren't UTF-8 encoded.

For SQL server, you can use cast or convert to change the varchar column causing the error into an nvarchar (UTF-16 Unicode) value.

Comments