让主管正确停止芹菜工人

Make supervisor stop Celery workers correctly

我在使用芹菜的时候遇到了很多奇怪的事情。比如,我更新了tasks.py,supervisorctl reload(重启),但是tasks出错了。有些任务好像消失了等等。
今天发现因为supervisorctl stop all 无法阻止所有的celery worker。而且只有kill -9 'pgrep python' 才能全部杀掉。

情况:

    root@ubuntu12:/data/www/article_fetcher# supervisorctl
    celery_beat                      RUNNING    pid 29597, uptime 0:52:18
    celery_worker1                   RUNNING    pid 29556, uptime 0:52:20
    celery_worker2                   RUNNING    pid 29570, uptime 0:52:19
    celery_worker3                   RUNNING    pid 29557, uptime 0:52:20
    celery_worker4                   RUNNING    pid 29586, uptime 0:52:18
    uwsgi                            RUNNING    pid 29604, uptime 0:52:18
    supervisor> stop all
    celery_beat: stopped
    celery_worker2: stopped
    celery_worker4: stopped
    celery_worker3: stopped
    uwsgi: stopped
    celery_worker1: stopped
    supervisor> status
    celery_beat                      STOPPED    Aug 04 11:05 AM
    celery_worker1                   STOPPED    Aug 04 11:05 AM
    celery_worker2                   STOPPED    Aug 04 11:05 AM
    celery_worker3                   STOPPED    Aug 04 11:05 AM
    celery_worker4                   STOPPED    Aug 04 11:05 AM
    uwsgi                            STOPPED    Aug 04 11:05 AM

进程:

root@ubuntu12:~# ps -aux|grep 'python'
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
root      8683  0.0  0.1  61420 11768 ?        Ss   Aug03   0:27 /usr/bin/python /usr/bin/supervisord
root     29310  0.1  0.1  57120 11344 pts/2    S+   11:05   0:00 /usr/bin/python /usr/bin/supervisorctl
nobody   29556  2.2  0.5 132484 45988 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody   29557  2.2  0.5 132480 45996 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29570  2.4  0.5 132740 45996 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
nobody   29571 26.9  1.4 217688 115804 ?       R    11:06   0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29572 33.7  0.7 158396 59808 ?        R    11:06   0:12 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29573 29.6  1.4 215176 115928 ?       R    11:06   0:10 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody   29574 27.2  1.4 218244 118180 ?       R    11:06   0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
......
......
......

我发现了这个问题:Stopping Supervisor doesn't stop Celery workers,但它问的是不同的东西,接受的答案 supervisorctl stop all 不起作用 actually.So 我决定找到正确的方法。

我查看 supervisor docs 并发现:

killasgroup

If true, when resorting to send SIGKILL to the program to terminate it send it to its whole process group instead, taking care of its children as well, useful e.g with Python programs using multiprocessing.

Default: false

Required: No.

Introduced: 3.0a11

然后我认为每个worker创建4个子进程(由cpu个核心)成为一个进程组,这就是为什么supervisorctl stop all不工作。
所以我将 killasgroup 添加到 supervisord.conf:

    [program:celery_worker1]
    ; Set full path to celery program if using virtualenv

    directory=/data/www/article_fetcher

    command=/data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
    user=nobody
    numprocs=1
    stdout_logfile=/data/www/article_fetcher/logs/celery.log
    stderr_logfile=/data/www/article_fetcher/logs/celery.log
    autostart=true
    autorestart=true
    startsecs=5
    killasgroup=true

    .....
    .....

supervisorctl stop all真的停芹菜工人了!很好~