gignosko gignosko - 2 months ago 97
Python Question

Read a file line by line from S3 using boto?

I have a csv file in S3 and I'm trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do this using boto? I thought maybe I could us a python BufferedReader, but I can't figure out how to open a stream from an S3 key. Any suggestions would be great. Thanks!

Answer

It appears that boto has a read() function that can do this. Here's some code that works for me:

>>> import boto
>>> from boto.s3.key import Key
>>> conn = boto.connect_s3('ap-southeast-2')
>>> bucket = conn.get_bucket('bucket-name')
>>> k = Key(bucket)
>>> k.key = 'filename.txt'
>>> k.open()
>>> k.read(10)
'This text '

The call to read(n) returns the next n bytes from the object.

Of course, this won't automatically return "the header line", but you could call it with a large enough number to return the header line at a minimum.