I just read the Hystrix guide and am trying to wrap my head around how the default circuit breaker and recovery period operate, and then how to customize their behavior.
Obviously, if the circuit is tripped, Hystrix will automatically call the command's
there are basically four reasons for Hystrix to call the fallback method: an exception, a timeout, too many parallel requests, or too many exceptions in the previous calls.
You might want to do a retry in your run() method if the return code or the exception you receive from your service indicate that a retry makes sense.
In your fallback method of the command you might retry when there was a timeout - when there where too many parallel requests or too many exceptions it usually makes no sense to call the same service again.
As also asked how to notify devops: You should connect a monitoring system to Hystrix that polls the status of the circuit breaker and the ratio of successful and unsuccessful calls. You can use the metrics publishers provided, JMX, or write your own adapter using Hystrix' API. I've written two adapters for Riemann and Zabbix in a tutorial I prepared; you'll new very few lines of code for that.
The tutorial also has a example application and a load driver to try out some scenarios.