芹菜:没有设置 task_reject_on_worker_lost=true 的 acks_late=true 的原因是什么

Celery: what is the reason to have acks_late=true without setting task_reject_on_worker_lost=true

在使用 celery 玩了一些“缺陷”场景后(Redis 是一个代理,不管它值多少钱)我们开始明白,如果不同时设置 task_reject_on_worker_lost=true,设置 acks_late=true 实际上是没有意义的] 因为任务不会被重新安排(同样,在我们的测试中)——任务永远处于“unacked”类别。

同时每个人都说 acks_late 会使任务重新安排在同一个/另一个工人身上,所以问题是:什么时候发生?

官方文档say

Note that the worker will acknowledge the message if the child process executing the task is terminated (either by the task calling sys.exit(), or by signal) even when acks_late is enabled. This behavior is intentional as…

  • We don’t want to rerun tasks that forces the kernel to send a SIGSEGV (segmentation fault) or similar signals to the process.

  • We assume that a system administrator deliberately killing the task does not want it to automatically restart.

  • A task that allocates too much memory is in danger of triggering the kernel OOM killer, the same may happen again.

  • A task that always fails when redelivered may cause a high-frequency message loop taking down the system.

If you really want a task to be redelivered in these scenarios you should consider enabling the task_reject_on_worker_lost setting.

有哪些不属于“工作人员故意终止或由于捕获到信号”类别的“出现问题”的可能示例?

重启,断电,硬件故障。 n.b., 你所有的例子都假设预取乘数是 1.