schoon schoon - 4 months ago 51
Python Question

SOLVED: UnicodeEncodeError: 'ascii' codec can't encode character error using writerow and map

In Python 2.7 and Ubuntu 14.04 I am trying to write to a csv file:

csv_w.writerow( map( lambda x: flatdata.get( x, "" ), columns ))


this gives me the notorious

UnicodeEncodeError: 'ascii' codec can't encode character u'\u265b' in position 19: ordinal not in range(128)


error.

The usual advice on here is to use
unicode(x).encode("utf-8")

I have tried this and also just
.encode("utf-8")
for both parameters in the get:

csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))


but I still get the same error.

Any help is much appreciated in getting rid of the error. (I imagine the
unicode("").encode("utf-8")
is clumsy but I'm still a newb).

EDIT:
My full program is:

#!/usr/bin/env python
import json
import csv
import fileinput
import sys
import glob
import os
def flattenjson( b, delim ):
val = {}
for i in b.keys():
if isinstance( b[i], dict ):
get = flattenjson( b[i], delim )
for j in get.keys():
val[ i + delim + j ] = get[j]
else:
val[i] = b[i]
return val
def createcolumnheadings(cols):
#create column headings
print ('a', cols)
columns = cols.keys()
columns = list( set( columns ) )
print('b', columns)
return columns
doOnce=True
out_file= open( 'Excel.csv', 'wb' )
csv_w = csv.writer( out_file, delimiter="\t" )
print sys.argv, os.getcwd()
os.chdir(sys.argv[1])
for line in fileinput.input(glob.glob("*.txt")):
print('filename:', fileinput.filename(),'line #:',fileinput.filelineno(),'line:', line)
data = json.loads(line)
flatdata = flattenjson(data, "__")
if doOnce:
columns=createcolumnheadings(flatdata)
print('c', columns)
csv_w.writerow(columns)
doOnce=False
csv_w.writerow( map( lambda x: flatdata.get( unicode(x).encode("utf-8"), unicode("").encode("utf-8") ), columns ))


Redacted single tweet that throws the error
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 14: ordinal not in range(128)
: is available here.

SOLUTION as per Alistair's advice I installed unicodescv.
The steps were:
Download the zip from here

install it: sudo pip install /path/to/zipfile/python-unicodecsv-master.zip

import unicodecsv as csv
csv_w = csv.writer(f, encoding='utf-8')
csv_w.writerow(flatdata.get(x, u'') for x in columns)

Answer

Without seeing your data it would seem that your data contains Unicode data types (See How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte" for a brief explination of Unicode vs. str types)

Your solution to encode it is then error prone - any str with non-ascii encoded in it will throw an error when you unicode() it (See previous link for explanation).

You should get all you data into Unicode types before writing to CSV. As Python 2.7's CSV module is broken, you will need to use the drop in replacement: https://github.com/jdunck/python-unicodecsv.

You may also wish to break out your map into a separate statement to avoid confusion. Make to sure to provide the full stacktrace and examples of your code.

Comments