mongolol mongolol - 6 months ago 53
Java Question

Iterate through Files in Google Cloud Bucket

I am attempting to implement a relatively simple ETL pipeline that iterates through files in a google cloud bucket. The bucket has two folders: /input and /output.

What I'm trying to do is write a Java/Scala script to iterate through files in /input, and have the transformation applied to those that are not present in /output or those that have a timestamp later than that in /output. I've been looking through the Java API doc for a function I can leverage (as opposed to just calling

gsutil ls ...
), but haven't had any luck so far. Any recommendations on where to look in the doc?

def getBucketFolderContents(
bucketName: String
) = {
val credential = getCredential
val httpTransport = GoogleNetHttpTransport.newTrustedTransport()
val requestFactory = httpTransport.createRequestFactory(credential)
val uri = "" + URLEncoder.encode(
"UTF-8") +
val url = new GenericUrl(uri)
val request = requestFactory.buildGetRequest(uri)
val response = request.execute()


Answer Source

You can list objects under a folder by setting the prefix string on the object listing API: The results of listing are sorted, so you should be able to list both folders and then walk through both in order and generate the diff list.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download