I am relatively new to Storm and I am attempting to create a topology that will take in a file, parse the contents and then run a third party API to do some analytics on that content.
I have a topology, one spout, and three bolts. Basically the spout feeds the file to the first bolt which will extract the file content. Then the second bolt will run the third party analytic, and the last bolt will write everything to an xml string representation.
I have tested that the first two bolts were working as expected, but the issue came when I added the last bolt. It seems that the second bolt takes a long time to run and Storm is timing out. The execution time of the third party code takes around 37 seconds. I have been reading that after 30 seconds Storm will time out the spout and fail it.
I keep seeing this in the logs:
17580 [Thread-9-disruptor-executor[3 3]-send-queue] INFO backtype.storm.util - Async loop interrupted!
That time (300 seconds) means that once a tuple is emitted by a spout, the topology has 300 seconds to process that tuple and all subsequent tuples that ripple through the topology (through the three bolts) as a result of that tuple.
If the spout emits a second tuple but the first bolt is still processing the first tuple, the clock is still ticking for that second tuple.
This means you either have to:
1) increase the parallelism hint for the bolts so that there's no backlog slowing down the processing for any tuple emitted by the spout, or
2) use the topology.max.spout.pending property to limit the number of tuples the spout can emit before having to wait for one of those tuples to complete.