芹菜:没有设置 task_reject_on_worker_lost=true 的 acks_late=true 的原因是什么
Celery: what is the reason to have acks_late=true without setting task_reject_on_worker_lost=true
在使用 celery 玩了一些“缺陷”场景后(Redis 是一个代理,不管它值多少钱)我们开始明白,如果不同时设置 task_reject_on_worker_lost=true
,设置 acks_late=true
实际上是没有意义的] 因为任务不会被重新安排(同样,在我们的测试中)——任务永远处于“unacked”类别。
同时每个人都说 acks_late
会使任务重新安排在同一个/另一个工人身上,所以问题是:什么时候发生?
官方文档say
Note that the worker will acknowledge the message if the child process
executing the task is terminated (either by the task calling
sys.exit(), or by signal) even when acks_late is enabled. This
behavior is intentional as…
We don’t want to rerun tasks that forces the kernel to send a SIGSEGV (segmentation fault) or similar signals to the process.
We assume that a system administrator deliberately killing the task does not want it to automatically restart.
A task that allocates too much memory is in danger of triggering the kernel OOM killer, the same may happen again.
A task that always fails when redelivered may cause a high-frequency message loop taking down the system.
If you really want a task to be redelivered in these scenarios you
should consider enabling the task_reject_on_worker_lost setting.
有哪些不属于“工作人员故意终止或由于捕获到信号”类别的“出现问题”的可能示例?
重启,断电,硬件故障。 n.b., 你所有的例子都假设预取乘数是 1.
在使用 celery 玩了一些“缺陷”场景后(Redis 是一个代理,不管它值多少钱)我们开始明白,如果不同时设置 task_reject_on_worker_lost=true
,设置 acks_late=true
实际上是没有意义的] 因为任务不会被重新安排(同样,在我们的测试中)——任务永远处于“unacked”类别。
同时每个人都说 acks_late
会使任务重新安排在同一个/另一个工人身上,所以问题是:什么时候发生?
官方文档say
Note that the worker will acknowledge the message if the child process executing the task is terminated (either by the task calling sys.exit(), or by signal) even when acks_late is enabled. This behavior is intentional as…
We don’t want to rerun tasks that forces the kernel to send a SIGSEGV (segmentation fault) or similar signals to the process.
We assume that a system administrator deliberately killing the task does not want it to automatically restart.
A task that allocates too much memory is in danger of triggering the kernel OOM killer, the same may happen again.
A task that always fails when redelivered may cause a high-frequency message loop taking down the system.
If you really want a task to be redelivered in these scenarios you should consider enabling the task_reject_on_worker_lost setting.
有哪些不属于“工作人员故意终止或由于捕获到信号”类别的“出现问题”的可能示例?
重启,断电,硬件故障。 n.b., 你所有的例子都假设预取乘数是 1.