user3422637 user3422637 - 7 months ago 32
SQL Question

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing.
My script reads data from all tables in a source MySQL DB, writes them in a python

StringIO.StringIO
object, and then loads that data from
StringIO
object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

I wrote all this code on the top of my Python script based on my research on the internet.

db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True)
db_conn.set_character_set('utf8')
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')


I still get the
UnicodeEncodeError
below with this line:
cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
,

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)


I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value


Please help.

Answer

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

You could also try,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.