Pranshu Pranshu - 26 days ago 7
Node.js Question

Automate redshift truncate/delete data after a retention period

I have a redshift table and it is storing a lot of data. Every weekend I go and manually using Workbench TRUNCATE last week of data that I no longer need.
I manually have to run

DELETE FROM tableName WHERE created_date BETWEEN timeStamp1 AND timeStamp2;

Is it possible to have some way to tell the table or have some expiration policy that removes the data every Sunday for me?

If not, Is there a way to automate the delete process every 7 days? Some sort of shell script or cron job in nodeJS that does this.

Answer Source

No, there is no in-built ability to run commands on a regular basis on Amazon Redshift. You could, however, run a script on another system that connects to Redshift and runs the command.

For example, a cron job that calls psql to connect to Redshift and execute the command. This could be done in a one-line script.

Alternatively, you could configure an AWS Lambda function to connect to Redshift and execute the command. (You would need to write the function yourself, but there are libraries that make this easier.) Then, you would configure Amazon CloudWatch Events to trigger the Lambda function on a desired schedule (eg once a week).

A common strategy is to actually store data in separate tables per time period (eg a month, but in your case it would be a week). Then, define a view that combines several tables. To delete a week of data, simply drop the table that contains that week of data, create a new table for this week's data, then update the view to point to the new table but not the old table.

By the way...

Your example uses the DELETE command, which is not the same as the TRUNCATE command.

TRUNCATE removes all data from a table. It is an efficient way to completely empty a table.

DELETE is good for removing part of a table but it simply marks rows as deleted. The data still occupies space on disk. Therefore, it is recommended that you VACUUM the table after deleting a significant quantity of data.