uwsgi worker 重生后保持空闲

uwsgi worker keep idle after respawning

环境

问题

我使用 nginx + uwsgi + django 来设置网络服务器,有时我发现响应太慢了,我不得不使用 uwsgi 提供的 touch command 来重新加载服务器。

在使用uwsgitop监控我的uwsgi服务器后,我发现一些uwsgi workers在重生后保持空闲状态,而这些工人的RSSVSZ 为零,如下图所示。

我在uwsgi日志中没有发现任何错误信息,生成信息是正常的,如下所示:

worker 6 killed successfully (pid: 14872)
Respawned uWSGI worker 6 (new pid: 5545)

worker 9 killed successfully (pid: 14878)
Respawned uWSGI worker 9 (new pid: 3807)

如果我使用 kill -9 worker-pid 命令重生工人,大多数时候工人可以重生成功并拥有 RSSVSZ 并开始工作,而有时只是重生零 RSSVSZ 并保持空闲状态。

我尽力了,但我不知道重生的工人发生了什么事。我post向uwsgi项目提出了issue,但长时间没有得到回应(应该不是uwsgi的issue)。

有什么调试或检查这个问题的建议吗?

仅供参考,这是我的 uwsgi 配置:

# uwsgi.ini file
[uwsgi]

# deploy root
deploy_root     = /my/server/path

# Django-related settings
# the base directory (full path)
chdir           = %(deploy_root)/
# Django's wsgi file
module          = MyServer.wsgi
# the virtualenv (full path)
home            = %(deploy_root)/env/

# process-related settings
# master
master          = true
# maximum number of worker processes
processes       = 10
# socket listen queue size,default 100
listen          = 1024
# respawn processes taking more than 300 seconds
harakiri        = 300
# respawn processes after serving 5000 requests
max-requests    = 5000
# the socket (use the full path to be safe
socket          = %(deploy_root)/nginx_uwsgi/server.sock
# ... with appropriate permissions - may be needed
chmod-socket    = 666
# clear environment on exit
vacuum          = true
# run background with log file
daemonize      = %(deploy_root)/nginx_uwsgi/logs/uwsgi.log
# use pid file to stop uwsgi easily
pidfile        = %(deploy_root)/nginx_uwsgi/uwsgi.pid
# use utf8
env            = PYTHONIOENCODING=UTF-8
# use threads
enable-threads = true

# stats socket (use the full path to be safe)
stats          = %(deploy_root)/nginx_uwsgi/stats.sock
# show memory resources uwsgi processes are consuming
memory-report  = true

正如我对 uwsgi issue Some worker keep idle after respawning shows, this problem caused by third-party module APScheduler 的评论一样。

我用stracegdb工具检查了idle worker,发现idle worker正好卡在了Waiting for the GIL.

所以我认为这应该是我在主进程中创建的线程引起的。我在主进程中引入的唯一线程是 APScheduler background instance 用于启动 cron 作业。

在我将调度程序作业逻辑移动到一个单独的进程后,这个问题再也没有发生过。