ipyparallel 显示 "registration: purging stalled registration"
ipyparallel displaying "registration: purging stalled registration"
我正在尝试将 ipyparallel 库用于 运行 不同机器上的 ipcontroller 和 ipengine。
我的设置如下:
远程机器:
Windows Server 2012 R2 x64,运行设置一个 ipcontroller,侦听端口 5900 和 ip=0.0.0.0.
本地机器:
Windows 10 x64,运行设置一个 ipengine,侦听远程机器的 ip 和端口 5900。
控制器启动命令:
ipcontroller --ip=0.0.0.0 --port=5900 --reuse --log-to-file=True
引擎启动命令:
ipengine --file=/c/Users/User/ipcontroller-engine.json --timeout=10 --log-to-file=True
我已经将 ipcontroller-engine.json 中的接口字段从 "tcp://127.0.0.1" 更改为 "tcp://" for ipengine。
启动时,这里是 ipcontroller 日志的快照:
2016-10-10 01:14:00.651 [IPControllerApp] Hub listening on tcp://0.0.0.0:5900 for registration.
2016-10-10 01:14:00.677 [IPControllerApp] Hub using DB backend: 'DictDB'
2016-10-10 01:14:00.956 [IPControllerApp] hub::created hub
2016-10-10 01:14:00.957 [IPControllerApp] task::using Python leastload Task scheduler
2016-10-10 01:14:00.959 [IPControllerApp] Heartmonitor started
2016-10-10 01:14:00.967 [IPControllerApp] Creating pid file: C:\Users\Administrator\.ipython\profile_default\pid\ipcontroller.pid
2016-10-10 01:14:02.102 [IPControllerApp] client::client b'\x00\x80\x00\x00)' requested 'connection_request'
2016-10-10 01:14:02.102 [IPControllerApp] client::client [b'\x00\x80\x00\x00)'] connected
2016-10-10 01:14:47.895 [IPControllerApp] client::client b'82f5efed-52eb-46f2-8c92-e713aee8a363' requested 'registration_request'
2016-10-10 01:15:05.437 [IPControllerApp] client::client b'efe6919d-98ac-4544-a6b8-9d748f28697d' requested 'registration_request'
2016-10-10 01:15:17.899 [IPControllerApp] registration::purging stalled registration: 1
和 ipengine 日志:
2016-10-10 13:44:21.037 [IPEngineApp] Registering with controller at tcp://172.17.3.14:5900
2016-10-10 13:44:21.508 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms.
2016-10-10 13:44:21.522 [IPEngineApp] Completed registration with id 1
2016-10-10 13:44:27.529 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
2016-10-10 13:44:30.539 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (2 time(s) in a row).
...
2016-10-10 13:46:52.009 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (49 time(s) in a row).
2016-10-10 13:46:55.028 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (50 time(s) in a row).
2016-10-10 13:46:55.028 [IPEngineApp] CRITICAL | Maximum number of heartbeats misses reached (50 times 3010 ms), shutting down.
(本机和远程VM有12.5小时的时差)
知道为什么会发生这种情况吗?
如果您使用的是 --reuse
,请确保在更改设置时删除这些文件。当给出 --reuse
并且您更改 --ip
之类的内容时,它可能表现不佳,因为连接文件可能会覆盖您的命令行参数。
设置 --ip=0.0.0.0
时,还设置 --location=a.b.c.d
可能会有用,其中 a.b.c.d
是您知道引擎可以访问的控制器的 IP 地址。更改
如果注册成功而后续连接失败,这可能是因为防火墙只打开一个端口,例如5900. 机器 运行 控制器需要打开连接文件中列出的 所有 端口。您可以通过在连接文件中手动输入端口号来将这些指定为端口范围。
我正在尝试将 ipyparallel 库用于 运行 不同机器上的 ipcontroller 和 ipengine。
我的设置如下:
远程机器: Windows Server 2012 R2 x64,运行设置一个 ipcontroller,侦听端口 5900 和 ip=0.0.0.0.
本地机器: Windows 10 x64,运行设置一个 ipengine,侦听远程机器的 ip 和端口 5900。
控制器启动命令: ipcontroller --ip=0.0.0.0 --port=5900 --reuse --log-to-file=True
引擎启动命令: ipengine --file=/c/Users/User/ipcontroller-engine.json --timeout=10 --log-to-file=True
我已经将 ipcontroller-engine.json 中的接口字段从 "tcp://127.0.0.1" 更改为 "tcp://" for ipengine。
启动时,这里是 ipcontroller 日志的快照:
2016-10-10 01:14:00.651 [IPControllerApp] Hub listening on tcp://0.0.0.0:5900 for registration.
2016-10-10 01:14:00.677 [IPControllerApp] Hub using DB backend: 'DictDB'
2016-10-10 01:14:00.956 [IPControllerApp] hub::created hub
2016-10-10 01:14:00.957 [IPControllerApp] task::using Python leastload Task scheduler
2016-10-10 01:14:00.959 [IPControllerApp] Heartmonitor started
2016-10-10 01:14:00.967 [IPControllerApp] Creating pid file: C:\Users\Administrator\.ipython\profile_default\pid\ipcontroller.pid
2016-10-10 01:14:02.102 [IPControllerApp] client::client b'\x00\x80\x00\x00)' requested 'connection_request'
2016-10-10 01:14:02.102 [IPControllerApp] client::client [b'\x00\x80\x00\x00)'] connected
2016-10-10 01:14:47.895 [IPControllerApp] client::client b'82f5efed-52eb-46f2-8c92-e713aee8a363' requested 'registration_request'
2016-10-10 01:15:05.437 [IPControllerApp] client::client b'efe6919d-98ac-4544-a6b8-9d748f28697d' requested 'registration_request'
2016-10-10 01:15:17.899 [IPControllerApp] registration::purging stalled registration: 1
和 ipengine 日志:
2016-10-10 13:44:21.037 [IPEngineApp] Registering with controller at tcp://172.17.3.14:5900
2016-10-10 13:44:21.508 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms.
2016-10-10 13:44:21.522 [IPEngineApp] Completed registration with id 1
2016-10-10 13:44:27.529 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row).
2016-10-10 13:44:30.539 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (2 time(s) in a row).
...
2016-10-10 13:46:52.009 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (49 time(s) in a row).
2016-10-10 13:46:55.028 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (50 time(s) in a row).
2016-10-10 13:46:55.028 [IPEngineApp] CRITICAL | Maximum number of heartbeats misses reached (50 times 3010 ms), shutting down.
(本机和远程VM有12.5小时的时差)
知道为什么会发生这种情况吗?
如果您使用的是 --reuse
,请确保在更改设置时删除这些文件。当给出 --reuse
并且您更改 --ip
之类的内容时,它可能表现不佳,因为连接文件可能会覆盖您的命令行参数。
设置 --ip=0.0.0.0
时,还设置 --location=a.b.c.d
可能会有用,其中 a.b.c.d
是您知道引擎可以访问的控制器的 IP 地址。更改
如果注册成功而后续连接失败,这可能是因为防火墙只打开一个端口,例如5900. 机器 运行 控制器需要打开连接文件中列出的 所有 端口。您可以通过在连接文件中手动输入端口号来将这些指定为端口范围。