Django RQ rqworker 无限期冻结
Django RQ rqworker freezes indefinitely
本周,我的集成测试停止工作。我发现这是一个无限期停滞的 django-rq 工作。我的输出:
$: RQ worker 'rq:worker:47e0aaf280be.13' started, version 0.12.0
$: *** Listening on default...
$: Cleaning registries for queue: default
$: default: myapp.engine.rules.process_event(<myapp.engine.event.Event object at 0x7f34f1ce50f0>) (a1e66a46-1a9d-4f52-be6f-6f4529dd2480)
这就是它冻结的点。我要键盘中断
密码没有改变。可以肯定的是,我回到主 b运行ch,检查了它,重新 运行 集成测试,他们也失败了。
如何从 python 中的测试用例开始调试 redis 或 rq 以了解可能发生的情况?有没有办法通过python查看实际队列记录? Redis 队列仅在测试 运行ning 时存在,并且由于它已冻结,我可以通过 Docker 容器 运行 通过 redis-cli
查看 Redis 队列启用 Redis 服务。
目前我使用的调试方法是:
from rq import Queue
from redis import Redis
from django_rq import get_worker
...
def test_motion_alarm(self):
motion_sensor_data = {"motion_detected": 1}
post_alarm(
self.live_server_url,
self.location,
self.sensor_device_id,
"ALARM_MOTIONDETECTED",
motion_sensor_data
)
redis_conn = Redis('my_queue')
q = Queue(connection=redis_conn)
print(len(q))
queued_job_ids = q.job_ids
queued_jobs = q.jobs
logger.debug('RQ info: \njob IDs: {}, \njobs: {}'.format(queued_job_ids, queued_jobs))
get_worker().work(burst=True)
time.sleep(1)
self.assertTrue(db.event_exists_at_location(
db.get_location_by_motion_detected(self.location_id),
"ALARM_MOTIONDETECTED"))
产生此调试输出:
$ DEBUG [myapi.tests.integration.test_rules:436] RQ info:
job IDs: ['bef879c4-832d-431d-97e7-9eec9f4bf5d7']
jobs: [Job('bef879c4-832d-431d-97e7-9eec9f4bf5d7', enqueued_at=datetime.datetime(2018, 12, 6, 0, 10, 14, 829488))]
$ RQ worker 'rq:worker:54f6054e7aa5.7' started, version 0.12.0
$ *** Listening on default...
$ Cleaning registries for queue: default
$ default: myapi.engine.rules.process_event(<myapi.engine.event.Event object at 0x7fbf204e8c50>) (bef879c4-832d-431d-97e7-9eec9f4bf5d7)
并且在队列容器中,运行在队列上设置一个 monitor
进程,我每隔一段时间就会看到一批新的监视器响应:
1544110882.343826 [0 172.19.0.4:38905] "EXPIRE" "rq:worker:ac50518f1c5e.7" "35"
1544110882.344304 [0 172.19.0.4:38905] "HSET" "rq:worker:ac50518f1c5e.7" "last_heartbeat" "2018-12-06T15:41:22.344170Z"
1544110882.968846 [0 172.19.0.4:38910] "EXPIRE" "rq:worker:ac50518f1c5e.12" "35"
1544110882.969651 [0 172.19.0.4:38910] "HSET" "rq:worker:ac50518f1c5e.12" "last_heartbeat" "2018-12-06T15:41:22.969181Z"
1544110884.122917 [0 172.19.0.4:38919] "EXPIRE" "rq:worker:ac50518f1c5e.13" "35"
1544110884.124966 [0 172.19.0.4:38919] "HSET" "rq:worker:ac50518f1c5e.13" "last_heartbeat" "2018-12-06T15:41:24.124809Z"
1544110884.708910 [0 172.19.0.4:38925] "EXPIRE" "rq:worker:ac50518f1c5e.14" "35"
1544110884.710736 [0 172.19.0.4:38925] "HSET" "rq:worker:ac50518f1c5e.14" "last_heartbeat" "2018-12-06T15:41:24.710599Z"
1544110885.415111 [0 172.19.0.4:38930] "EXPIRE" "rq:worker:ac50518f1c5e.15" "35"
1544110885.417279 [0 172.19.0.4:38930] "HSET" "rq:worker:ac50518f1c5e.15" "last_heartbeat" "2018-12-06T15:41:25.417155Z"
1544110886.028965 [0 172.19.0.4:38935] "EXPIRE" "rq:worker:ac50518f1c5e.16" "35"
1544110886.030002 [0 172.19.0.4:38935] "HSET" "rq:worker:ac50518f1c5e.16" "last_heartbeat" "2018-12-06T15:41:26.029817Z"
1544110886.700132 [0 172.19.0.4:38940] "EXPIRE" "rq:worker:ac50518f1c5e.17" "35"
1544110886.701861 [0 172.19.0.4:38940] "HSET" "rq:worker:ac50518f1c5e.17" "last_heartbeat" "2018-12-06T15:41:26.701716Z"
1544110887.359702 [0 172.19.0.4:38945] "EXPIRE" "rq:worker:ac50518f1c5e.18" "35"
1544110887.361642 [0 172.19.0.4:38945] "HSET" "rq:worker:ac50518f1c5e.18" "last_heartbeat" "2018-12-06T15:41:27.361481Z"
1544110887.966641 [0 172.19.0.4:38950] "EXPIRE" "rq:worker:ac50518f1c5e.19" "35"
1544110887.967931 [0 172.19.0.4:38950] "HSET" "rq:worker:ac50518f1c5e.19" "last_heartbeat" "2018-12-06T15:41:27.967760Z"
1544110888.595785 [0 172.19.0.4:38955] "EXPIRE" "rq:worker:ac50518f1c5e.20" "35"
1544110888.596962 [0 172.19.0.4:38955] "HSET" "rq:worker:ac50518f1c5e.20" "last_heartbeat" "2018-12-06T15:41:28.596799Z"
1544110889.199269 [0 172.19.0.4:38960] "EXPIRE" "rq:worker:ac50518f1c5e.21" "35"
1544110889.200416 [0 172.19.0.4:38960] "HSET" "rq:worker:ac50518f1c5e.21" "last_heartbeat" "2018-12-06T15:41:29.200265Z"
1544110889.783128 [0 172.19.0.4:38965] "EXPIRE" "rq:worker:ac50518f1c5e.22" "35"
1544110889.785444 [0 172.19.0.4:38965] "HSET" "rq:worker:ac50518f1c5e.22" "last_heartbeat" "2018-12-06T15:41:29.785158Z"
1544110890.422338 [0 172.19.0.4:38970] "EXPIRE" "rq:worker:ac50518f1c5e.23" "35"
1544110890.423470 [0 172.19.0.4:38970] "HSET" "rq:worker:ac50518f1c5e.23" "last_heartbeat" "2018-12-06T15:41:30.423314Z"
而且,奇怪的是,也许是故意的,每次我看到这些经过时,它们都会在 :30 或 :00 秒结束。
所以,我可以确定是的,队列中确实有这个项目,而且作业是 运行ning,那么为什么作业不是每次都接起来并且 运行?
这似乎是 rq_scheduler
库中最近报告的一个缺陷,如此处报告:https://github.com/rq/rq-scheduler/issues/197
有一个PR in the works for it。然而,我注意到我们允许 redis
库增加到 3.0.0
而没有明确请求那个版本,而 that 最终导致系统崩溃.
在构建脚本中,我将Dockerfile设置为执行:RUN pip install redis=="2.10.6"
,暂时缓解了这个问题。
本周,我的集成测试停止工作。我发现这是一个无限期停滞的 django-rq 工作。我的输出:
$: RQ worker 'rq:worker:47e0aaf280be.13' started, version 0.12.0
$: *** Listening on default...
$: Cleaning registries for queue: default
$: default: myapp.engine.rules.process_event(<myapp.engine.event.Event object at 0x7f34f1ce50f0>) (a1e66a46-1a9d-4f52-be6f-6f4529dd2480)
这就是它冻结的点。我要键盘中断
密码没有改变。可以肯定的是,我回到主 b运行ch,检查了它,重新 运行 集成测试,他们也失败了。
如何从 python 中的测试用例开始调试 redis 或 rq 以了解可能发生的情况?有没有办法通过python查看实际队列记录? Redis 队列仅在测试 运行ning 时存在,并且由于它已冻结,我可以通过 Docker 容器 运行 通过 redis-cli
查看 Redis 队列启用 Redis 服务。
目前我使用的调试方法是:
from rq import Queue
from redis import Redis
from django_rq import get_worker
...
def test_motion_alarm(self):
motion_sensor_data = {"motion_detected": 1}
post_alarm(
self.live_server_url,
self.location,
self.sensor_device_id,
"ALARM_MOTIONDETECTED",
motion_sensor_data
)
redis_conn = Redis('my_queue')
q = Queue(connection=redis_conn)
print(len(q))
queued_job_ids = q.job_ids
queued_jobs = q.jobs
logger.debug('RQ info: \njob IDs: {}, \njobs: {}'.format(queued_job_ids, queued_jobs))
get_worker().work(burst=True)
time.sleep(1)
self.assertTrue(db.event_exists_at_location(
db.get_location_by_motion_detected(self.location_id),
"ALARM_MOTIONDETECTED"))
产生此调试输出:
$ DEBUG [myapi.tests.integration.test_rules:436] RQ info:
job IDs: ['bef879c4-832d-431d-97e7-9eec9f4bf5d7']
jobs: [Job('bef879c4-832d-431d-97e7-9eec9f4bf5d7', enqueued_at=datetime.datetime(2018, 12, 6, 0, 10, 14, 829488))]
$ RQ worker 'rq:worker:54f6054e7aa5.7' started, version 0.12.0
$ *** Listening on default...
$ Cleaning registries for queue: default
$ default: myapi.engine.rules.process_event(<myapi.engine.event.Event object at 0x7fbf204e8c50>) (bef879c4-832d-431d-97e7-9eec9f4bf5d7)
并且在队列容器中,运行在队列上设置一个 monitor
进程,我每隔一段时间就会看到一批新的监视器响应:
1544110882.343826 [0 172.19.0.4:38905] "EXPIRE" "rq:worker:ac50518f1c5e.7" "35"
1544110882.344304 [0 172.19.0.4:38905] "HSET" "rq:worker:ac50518f1c5e.7" "last_heartbeat" "2018-12-06T15:41:22.344170Z"
1544110882.968846 [0 172.19.0.4:38910] "EXPIRE" "rq:worker:ac50518f1c5e.12" "35"
1544110882.969651 [0 172.19.0.4:38910] "HSET" "rq:worker:ac50518f1c5e.12" "last_heartbeat" "2018-12-06T15:41:22.969181Z"
1544110884.122917 [0 172.19.0.4:38919] "EXPIRE" "rq:worker:ac50518f1c5e.13" "35"
1544110884.124966 [0 172.19.0.4:38919] "HSET" "rq:worker:ac50518f1c5e.13" "last_heartbeat" "2018-12-06T15:41:24.124809Z"
1544110884.708910 [0 172.19.0.4:38925] "EXPIRE" "rq:worker:ac50518f1c5e.14" "35"
1544110884.710736 [0 172.19.0.4:38925] "HSET" "rq:worker:ac50518f1c5e.14" "last_heartbeat" "2018-12-06T15:41:24.710599Z"
1544110885.415111 [0 172.19.0.4:38930] "EXPIRE" "rq:worker:ac50518f1c5e.15" "35"
1544110885.417279 [0 172.19.0.4:38930] "HSET" "rq:worker:ac50518f1c5e.15" "last_heartbeat" "2018-12-06T15:41:25.417155Z"
1544110886.028965 [0 172.19.0.4:38935] "EXPIRE" "rq:worker:ac50518f1c5e.16" "35"
1544110886.030002 [0 172.19.0.4:38935] "HSET" "rq:worker:ac50518f1c5e.16" "last_heartbeat" "2018-12-06T15:41:26.029817Z"
1544110886.700132 [0 172.19.0.4:38940] "EXPIRE" "rq:worker:ac50518f1c5e.17" "35"
1544110886.701861 [0 172.19.0.4:38940] "HSET" "rq:worker:ac50518f1c5e.17" "last_heartbeat" "2018-12-06T15:41:26.701716Z"
1544110887.359702 [0 172.19.0.4:38945] "EXPIRE" "rq:worker:ac50518f1c5e.18" "35"
1544110887.361642 [0 172.19.0.4:38945] "HSET" "rq:worker:ac50518f1c5e.18" "last_heartbeat" "2018-12-06T15:41:27.361481Z"
1544110887.966641 [0 172.19.0.4:38950] "EXPIRE" "rq:worker:ac50518f1c5e.19" "35"
1544110887.967931 [0 172.19.0.4:38950] "HSET" "rq:worker:ac50518f1c5e.19" "last_heartbeat" "2018-12-06T15:41:27.967760Z"
1544110888.595785 [0 172.19.0.4:38955] "EXPIRE" "rq:worker:ac50518f1c5e.20" "35"
1544110888.596962 [0 172.19.0.4:38955] "HSET" "rq:worker:ac50518f1c5e.20" "last_heartbeat" "2018-12-06T15:41:28.596799Z"
1544110889.199269 [0 172.19.0.4:38960] "EXPIRE" "rq:worker:ac50518f1c5e.21" "35"
1544110889.200416 [0 172.19.0.4:38960] "HSET" "rq:worker:ac50518f1c5e.21" "last_heartbeat" "2018-12-06T15:41:29.200265Z"
1544110889.783128 [0 172.19.0.4:38965] "EXPIRE" "rq:worker:ac50518f1c5e.22" "35"
1544110889.785444 [0 172.19.0.4:38965] "HSET" "rq:worker:ac50518f1c5e.22" "last_heartbeat" "2018-12-06T15:41:29.785158Z"
1544110890.422338 [0 172.19.0.4:38970] "EXPIRE" "rq:worker:ac50518f1c5e.23" "35"
1544110890.423470 [0 172.19.0.4:38970] "HSET" "rq:worker:ac50518f1c5e.23" "last_heartbeat" "2018-12-06T15:41:30.423314Z"
而且,奇怪的是,也许是故意的,每次我看到这些经过时,它们都会在 :30 或 :00 秒结束。
所以,我可以确定是的,队列中确实有这个项目,而且作业是 运行ning,那么为什么作业不是每次都接起来并且 运行?
这似乎是 rq_scheduler
库中最近报告的一个缺陷,如此处报告:https://github.com/rq/rq-scheduler/issues/197
有一个PR in the works for it。然而,我注意到我们允许 redis
库增加到 3.0.0
而没有明确请求那个版本,而 that 最终导致系统崩溃.
在构建脚本中,我将Dockerfile设置为执行:RUN pip install redis=="2.10.6"
,暂时缓解了这个问题。