martina martina - 2 months ago 25
Python Question

Retrieving subfolders names in S3 bucket from boto3

Using boto3, I can access my AWS S3 bucket:

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')

Now, the bucket contains folder
, which itself contains several sub-folders named with a timestamp, for instance
I need to know the name of these sub-folders for another job I'm doing and I wonder whether I could have boto3 retrieve those for me.

So I tried:

objs = bucket.meta.client.list_objects(Bucket='my-bucket-name')

which gives a dictionary, whose key 'Contents' gives me all the third-level files instead of the second-level timestamp directories, in fact I get a list containing things as

{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified':
datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo=tzutc()),

u'Owner': {u'DisplayName': 'owner', u'ID':

u'Size': size, u'StorageClass': 'storageclass'}

you can see that the specific files, in this case
are retrieved, while I'd like to get the name of the directory alone.
In principle I could strip out the directory name from all the paths but it's ugly and expensive to retrieve everything at third level to get the second level!

I also tried something reported here:

for o in bucket.objects.filter(Delimiter='/'):

but I do not get the folders at the desired level.

Is there a way to solve this?


S3 is an object storage, it doesn't have real directory structure. The "/" is rather cosmetic. One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.

To manipulate object in S3, you need boto3.client, not the boto3.resource. To list all object

import boto3 
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = 'my-bucket-name')

A reminder about boto3 : boto3.resource is a nice high level API, but it doesn't give you complete access to more narrative resources like boto3.client.