Mithril Mithril - 2 months ago 8
Python Question

Make supervisor stop Celery workers correctly

I meet a lot weird thing when using celery. Such as, I update tasks.py,

supervisorctl reload
(restart), but tasks is wrong. Some tasks seems disappear and so on.

Today I found that because
supervisorctl stop all
can not stop all celery workers. And only kill -9 'pgrep python' can kill them all.

situation:

root@ubuntu12:/data/www/article_fetcher# supervisorctl
celery_beat RUNNING pid 29597, uptime 0:52:18
celery_worker1 RUNNING pid 29556, uptime 0:52:20
celery_worker2 RUNNING pid 29570, uptime 0:52:19
celery_worker3 RUNNING pid 29557, uptime 0:52:20
celery_worker4 RUNNING pid 29586, uptime 0:52:18
uwsgi RUNNING pid 29604, uptime 0:52:18
supervisor> stop all
celery_beat: stopped
celery_worker2: stopped
celery_worker4: stopped
celery_worker3: stopped
uwsgi: stopped
celery_worker1: stopped
supervisor> status
celery_beat STOPPED Aug 04 11:05 AM
celery_worker1 STOPPED Aug 04 11:05 AM
celery_worker2 STOPPED Aug 04 11:05 AM
celery_worker3 STOPPED Aug 04 11:05 AM
celery_worker4 STOPPED Aug 04 11:05 AM
uwsgi STOPPED Aug 04 11:05 AM


processes:

root@ubuntu12:~# ps -aux|grep 'python'
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
root 8683 0.0 0.1 61420 11768 ? Ss Aug03 0:27 /usr/bin/python /usr/bin/supervisord
root 29310 0.1 0.1 57120 11344 pts/2 S+ 11:05 0:00 /usr/bin/python /usr/bin/supervisorctl
nobody 29556 2.2 0.5 132484 45988 ? S 11:06 0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody 29557 2.2 0.5 132480 45996 ? S 11:06 0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody 29570 2.4 0.5 132740 45996 ? S 11:06 0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
nobody 29571 26.9 1.4 217688 115804 ? R 11:06 0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody 29572 33.7 0.7 158396 59808 ? R 11:06 0:12 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody 29573 29.6 1.4 215176 115928 ? R 11:06 0:10 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody 29574 27.2 1.4 218244 118180 ? R 11:06 0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
......
......
......


I found this question:Stopping Supervisor doesn't stop Celery workers, but it is asking different thing, the accepted answer
supervisorctl stop all
do not work actually.So I decide find the right way.

Answer

I look into supervisor docs and find this:

killasgroup

If true, when resorting to send SIGKILL to the program to terminate it send it to its whole process group instead, taking care of its children as well, useful e.g with Python programs using multiprocessing.

Default: false

Required: No.

Introduced: 3.0a11

Then I think that each worker create 4 child process(by cpu cores) become a process group, that's why supervisorctl stop all do not work.
So I add killasgroup to supervisord.conf:

    [program:celery_worker1]
    ; Set full path to celery program if using virtualenv

    directory=/data/www/article_fetcher

    command=/data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
    user=nobody
    numprocs=1
    stdout_logfile=/data/www/article_fetcher/logs/celery.log
    stderr_logfile=/data/www/article_fetcher/logs/celery.log
    autostart=true
    autorestart=true
    startsecs=5
    killasgroup=true

    .....
    .....

Then supervisorctl stop all really stop celery workers! very well~