芹菜没有检测到关闭的代理连接并在一段时间后冻结

celery does not detect closed broker connections and freezes after a while

我有 celery==3.1.16 和 Python 2.7,它有一些任务,它的代理是 Redis。 刚开始表演的时候还好,过一会就卡住了。 我检查了容器中的 TCP 连接,我看到了这个:

tcp        1      0 WORKER:34472 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39884 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57292 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:60030 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:39906 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57102 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34508 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39874 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57182 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57106 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39870 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57056 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39902 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34494 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:40878 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:39878 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:53138 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:43818 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:39876 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:50586 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59800 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57128 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57238 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57346 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57050 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39896 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:44850 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57124 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:39904 REDIS_HOST:6379 CLOSE_WAIT 
tcp        0      0 WORKER:39872 REDIS_HOST:6379 ESTABLISHED
tcp        1      0 WORKER:57160 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57190 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59724 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:57260 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:34492 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:59740 REDIS_HOST:6379 CLOSE_WAIT 
tcp        1      0 WORKER:55426 REDIS_HOST:6379 CLOSE_WAIT

我认为 celery worker 没有检测到关闭的连接来重新建立它们,我必须定期重置 worker pod 才能再次工作。

有什么想法吗?

我运行 celery 使用 solo pool-size 模式并使用多个 Kubernetes pods 而不是一个 pod,因为 solo 模式是单线程的,这样问题就解决了。

https://docs.celeryproject.org/en/stable/internals/reference/celery.concurrency.solo.html