naviram naviram - 11 months ago 79
Ruby Question

sidekiq - Is concurrency > 50 stable?

Sidekiq documentation says:

Don't set the concurrency higher than 50. I've seen stability issues
with concurrency of 100, for example

Well, my low memory consumption enables me to use concurrency of 350 threads on a single 512MB X1 heroku dyno. And I would like to use ~300 because all jobs are IO intensive (http requests).

I wonder what issues can I encounter in?

I tried to monitor the logs at overload with 80 and seen no issues.

What issues should I expect when setting up concurrency of 300 threads? Will I risk jobs getting terminated without being moved to the "dead" queue? OR just a termination of workers that I will be able to watch.
Is it safe to set a concurrency of 300 or 100?

The owner of sidekiq doesn't know the answer and here is the issue I opened.

In high load, when I increased from 80 to 100 I started getting 'can't create Thread: Resource temporarily unavailable' errors here and there, in extreme cases of 180 threads it will sometime terminate the entire sidekiq process.

The memory consumption was always between 140MB to 240MB according to Heroku metrics.

I used TTIN signal as describe here

And found that most threads are waiting on those lines of code:

app[worker.1]: 3 TID-ow5z46exw WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/monitor.rb:187:in `lock'

app[worker.1]: 3 TID-os9ulw8ps WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/net/http.rb:880:in `initialize'

app[worker.1]: 3 TID-os9ulw8ps WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/timeout.rb:95:in `join'

app[worker.1]: 3 TID-osjnd6zac WARN: /app/vendor/ruby-2.3.0/lib/ruby/2.3.0/net/protocol.rb:158:in `wait_readable'

Everything is documented in the github issue

The owner of sidekiq says that the traces looks fine, so no luck spotting the root cause for the stablity issue, but there is input in how many threads causes it and what is the symptom.

Answer Source

Well, sidekiq stability issues in high concurrency are as follows.

When you are setting a concurrency that is higher than 80 (or 50) you may encounter in this error "can't create Thread: Resource temporarily unavailable:"

Some jobs will return back to queue, sometimes the entire process will be terminated and jobs will be lost, unless you use sidekiq pro reliability feature

The above will happen also when the memory consumption will stay low (< 240MB in my example)

Everything is updated in the github issue