NoobSter NoobSter - 9 months ago 51
Node.js Question

Direct to s3 / client file uploads. Is there a concern for storing unused files in your s3 bucket with this approach?

I was looking over this tutorial suggested by heroku (for NodeJs apps) for signing the s3 file via the server, then handling the upload directly from the client:

I wonder about the potential of many unused files being stored in your s3 bucket.

In the heroku tutorial, they run through a user-edit scenario. They state :

  • When a user selects an image to be uploaded, the upload to S3 is
    handled automatically and asynchronously with the process described
    earlier in this article. The image preview is then updated with the
    selected image once the upload is complete and successful.

  • The user then clicks the “submit” button, which posts the username,
    name and the URL of the uploaded image to the Node application to be
    checked and/or stored. If no image was uploaded by the user earlier
    the default avatar image URL is posted instead.

So, what if the user selects the image, then clicks to another page without finishing and submitting the user-profile information?

Is that just a wasted file stored in your s3 bucket?

I"m using a similar approach in my MEAN app, where on submit:

  • I upload the file to S3.

  • Then, with a promise, once the upload completes, I submit the user's
    profile information / post with the s3 url to the server.

I had a similar concern, what if the user leaves before the promise invokes or there is a connection issue -- is there a concern of storing unused files?

Is this the typical way of handling this, and that's just part of the risk?

Answer Source

The fact that the upload is occurring client side doesn't make this problem any more or less worse than if you did it server side. You still need a mechanism of handling errors and keeping things in sync with your database.

How you handle this depends on your specific application needs. @ceejayoz mentioned one method of having a temporary bucket. I prefer to not worry about the upload until it's done, handling the data update with a Lambda job. You could also have a sync process with list operations if you really wanted to, but I think that's typically unnecessary.