Celery 升级 (3.1->4.1) - 由对等方重置连接

Celery upgrade (3.1->4.1) - Connection reset by peer

我们在去年使用 celery,大约有 15 个工人,每个工人的并发数在 1-4 之间。

最近我们将 celery 从 v3.1 升级到 v4.1

现在我们在每个工作日志中都有以下错误,知道是什么导致了这样的错误吗?

2017-08-21 18:33:19,780 94794  ERROR   Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
  File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
    self.node.handle_message(body, message)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
    return self.dispatch(**body)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
    ticket=ticket)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
    serializer=self.mailbox.serializer)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
    **opts
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
    exchange_name, declare,
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
    mandatory=mandatory, immediate=immediate,
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
    (0, exchange, routing_key, mandatory, immediate), msg
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
    write(view[:offset])
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
    self._write(s)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer

顺便说一句:我们的任务形式为:

@app.task(name='EXAMPLE_TASK'],
          bind=True,
          base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
    # task code

我们在芹菜方面也遇到了很多问题...我花了 20% 的时间与我们的工人讨论奇怪的 idle-hang/crash 问题 叹息

我们有一个类似的案例,是由高并发和高 worker_prefetch_multiplier 引起的,事实证明,获取数千个任务是破坏连接的好方法。

如果不是这样:尝试通过将 broker_pool_limit 设置为 None 来禁用代理池。

只是一些可能(希望)有所帮助的快速想法:-)