Reed_Xia Reed_Xia - 2 months ago 13
Python Question

Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the

list
of all objects in a folder, a file name
list
, then check if the file
abc.txt
is in the file name
list
.

Now the problem is, it looks Google only provide the one way to get
obj
list
, which is
uri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()


The defect of
uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get the
obj
name
list
of particular folder(e.g
gs//mybucket/abc/myfolder
) , which should be much quickly.

Could someone help answer? Appreciate every answer!

Answer

You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:

from apiclient import discovery

# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1beta2') # add http=whatever param if auth
request = client.objects().list(
    bucket="mybucket",
    prefix="abc/myfolder")
while request is not None:
  response = request.execute()
  print json.dumps(response, indent=2)
  request = request.list_next(request, response)

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/