We have a write-intensive table (on AWS RDS MySQL) from a legacy system and we'd like to stream every write event (insert or updated) from that table to kinesis. The idea is to create a pipe to warmup caches and update search engines.
Currently we do that using a rudimentar polling architecture, basically using SQL, but the ideal would be to have a push architecture reading the events directly from the transaction log.
Has anyone tried it? Any suggested architecture?
I've worked with some customers doing that already, in Oracle. Seems also that LinkedIn uses a lot that technique of streaming data from databases to somewhere else. They created a platform called Databus to accomplish that in an agnostic way - https://github.com/linkedin/databus/wiki/Databus-for-MySQL.
There is a public project in Github, following LinkedIn principles that is already streaming binlog from Mysql to Kinesis Streams - https://github.com/cmerrick/plainview
If you want to get into the nitty gritty details of LinkedIn approach, there is a really nice (and extensive) blog post available - https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying.
Last but not least, Yelp is doing that as well, but with Kafka - https://engineeringblog.yelp.com/2016/08/streaming-mysql-tables-in-real-time-to-kafka.html
Not getting into the basics of Kinesis Streams, for the sake of brevity, if we bring Kinesis Streams to the game, I don't see why it shouldn't work. As a matter of fact, it was built for that - your database transaction log is a stream of events. Borrowing an excerpt from Amazon Web Services public documentation: Amazon Kinesis Streams allows for real-time data processing. With Amazon Kinesis Streams, you can continuously collect data as it is generated and promptly react to critical information about your business and operations.
Hope this helps.