rq (redis queue) work-horse 意外终止,关于如何调试的建议?

rq (redis queue) work-horse terminated unexpectedly, suggestions on how to debug?

我正在使用 RQ worker 处理大量作业,但我遇到了问题。

观察

{"message": "my_queue: my_job() (dcf797c4-1434-4b77-a344-5bbb1f775113)"}
{"message": "Killed horse pid 8451"}
{"message": "Moving job to FailedJobRegistry (work-horse terminated unexpectedly; waitpid returned None)"}
        while True:
            try:
                with UnixSignalDeathPenalty(self.job_monitoring_interval, HorseMonitorTimeoutException):
                    retpid, ret_val = os.waitpid(self._horse_pid, 0)
                break
            except HorseMonitorTimeoutException:
                # Horse has not exited yet and is still running.
                # Send a heartbeat to keep the worker alive.
                self.heartbeat(self.job_monitoring_interval + 5)

                # Kill the job from this side if something is really wrong (interpreter lock/etc).
                if job.timeout != -1 and (utcnow() - job.started_at).total_seconds() > (job.timeout + 1):
                    self.kill_horse()
                    break

我接下来应该怎么做?

我认为最新版本的 RQ (https://github.com/rq/rq/releases/tag/v1.4.0) 有解决方案。

Fixed a bug that may cause early termination of scheduled or requeued jobs. Thanks @rmartin48!