Debpriya Seal Debpriya Seal - 1 year ago 116
Python Question

DictReader and UnicodeError

def openFile(fileName):
trainFile =,"r",encoding = "utf-8")
except IOError as e:
print ("File could not be opened: {}".format(e))
trainData = csv.DictReader(trainFile)
print trainData
return trainData

def computeTFIDF(trainData):
bodyList = []
print "Inside computeTFIDF"
for row in trainData:
for key, value in row.iteritems():
print key, unicode(value, "utf-8", "ignore")
print "Done"

if __name__ == "__main__":
print "Main"
trainData = openFile("../Data/TrainSample.csv")
print "File Opened"


Traceback (most recent call last):
File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\", line 62, in <module>
File "C:\DebSeal\IUB MS Program\IUB Sem III\Facebook Kaggle Comp\Src\", line 42, in computeTFIDF
for row in trainData:
File "C:\Python27\lib\", line 104, in next
row =
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 215: ordinal not in range(128)

: Is a csv file with 4 columns (with header).

OS: Windows 7 64 bit.

Using Python 2.x

I don't know what is going wrong here. I said it to ignore the encoding. But still is throws the same error.

I think before the control reaches the encoding, it throws an error.

Can anybody tell me where I am going wrong.

Answer Source

The Python 2 CSV module does not handle Unicode input.

Open the file in binary mode, and decode after parsing it as CSV. This is safe for the UTF-8 codec as newlines, delimiters and quotes all encode to 1 byte.

The csv module documentation includes a UnicodeReader wrapper class in the example section that will do the decoding for you; it is easily adapted to the DictReader class:

import csv

class UnicodeDictReader:
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        self.encoding = encoding
        self.reader = csv.DictReader(f, dialect=dialect, **kwds)

    def next(self):
        row =
        # Python 2.7+
        return {k: unicode(v, "utf-8") for k, v in row.iteritems()}
        # < Python2.7
        # return dict((k, unicode(v,"utf-8") if value is not None else None) for (k, v) in row.iteritems())¬

    def __iter__(self):
        return self

Use this with the file opened in binary mode:

def openFile(fileName):
        trainFile  = open(fileName, "rb")
    except IOError as e:
        print "File could not be opened: {}".format(e)
        return UnicodeDictReader(trainFile)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download