When a Heroku worker is restarted (either on command or as the result of a deploy), Heroku sends
Put this at the top of your job method:
begin term_now = false old_term_handler = trap 'TERM' do term_now = true old_term_handler.call end
Make sure this is called at least once every ten seconds:
if term_now puts 'told to terminate' return true end
At the end of your method, put this:
ensure trap 'TERM', old_term_handler end
I was having the same problem and came upon this Heroku article.
The job contained an outer loop, so I followed the article and added a
delayed_job picks that up as
failed with SystemExit and marks the task as failed.
SIGTERM now trapped by our
trap the worker's handler isn't called and instead it immediately restarts the job and then gets
SIGKILL a few seconds later. Back to square one.
I tried a few alternatives to
return true marks the job as successful (and removes it from the queue), but suffers from the same problem if there's another job waiting in the queue.
exit! will successfully exit the job and the worker, but it doesn't allow the worker to remove the job from the queue, so you still have the 'orphaned locked jobs' problem.
My final solution was the one given at at the top of my answer, it comprises of three parts:
Before we start the potentially long job we add a new interrupt handler for
'TERM' by doing a
trap (as described in the Heroku article), and we use it to set
term_now = true.
But we must also grab the
old_term_handler which the delayed job worker code set (which is returned by
trap) and remember to
We still must ensure that we return control to
Delayed:Job:Worker with sufficient time for it to clean up and shutdown, so we should check
term_now at least (just under) every ten seconds and
return if it is
You can either
return true or
return false depending on whether you want the job to be considered successful or not.
Finally it is vital to remember to remove your handler and install back the
Delayed:Job:Worker one when you have finished. If you fail to do this you will keep a dangling reference to the one we added, which can result in a memory leak if you add another one on top of that (for example, when the worker starts this job again).