RoiEX RoiEX - 1 year ago 104
Python Question

read csv file inside package

Sorry if this question is stupid, but I'm kind of a python newbie.

I'm trying to port a python 2.7 code base to python 3.4...

I found this code snippet which should iterate over a csv file inside the packed application.

Although PyDev is telling me, that pkg_resources.respource_stream is undefined the first line seems to work, causing the 3rd line to throw this error:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

io = pkg_resources.resource_stream(__name__, "data.csv")
c = csv.reader(io)
for record in c:

I tried switching the method to resource_string, ResourceManager.resourceStream etc. but all I got was different errors.

Answer Source

pkg_resources.resource_stream returns a stream that reads in binary mode; it just returns the bytes read, and doesn't try to decode them using a particular encoding.

Most tools for encoding and decoding text are found in the codecs module. To convert a binary reader into a text reader given a particular encoding, use codecs.getreader. Since you are bundling this file yourself, you should know the encoding, which should probably be UTF-8. So you would write:

io = pkg_resources.resource_stream(__name__, "data.csv")
utf8_reader = codecs.getreader("utf-8")
c = csv.reader(utf8_reader(io))
for record in c:
    # doStuff
