ipyparallel 显示 "registration: purging stalled registration"

ipyparallel displaying "registration: purging stalled registration"

我正在尝试将 ipyparallel 库用于 运行 不同机器上的 ipcontroller 和 ipengine。

我的设置如下:

远程机器: Windows Server 2012 R2 x64,运行设置一个 ipcontroller,侦听端口 5900 和 ip=0.0.0.0.

本地机器: Windows 10 x64,运行设置一个 ipengine,侦听远程机器的 ip 和端口 5900。

控制器启动命令: ipcontroller --ip=0.0.0.0 --port=5900 --reuse --log-to-file=True

引擎启动命令: ipengine --file=/c/Users/User/ipcontroller-engine.json --timeout=10 --log-to-file=True

我已经将 ipcontroller-engine.json 中的接口字段从 "tcp://127.0.0.1" 更改为 "tcp://" for ipengine。

启动时,这里是 ipcontroller 日志的快照:

2016-10-10 01:14:00.651 [IPControllerApp] Hub listening on tcp://0.0.0.0:5900 for registration. 2016-10-10 01:14:00.677 [IPControllerApp] Hub using DB backend: 'DictDB' 2016-10-10 01:14:00.956 [IPControllerApp] hub::created hub 2016-10-10 01:14:00.957 [IPControllerApp] task::using Python leastload Task scheduler 2016-10-10 01:14:00.959 [IPControllerApp] Heartmonitor started 2016-10-10 01:14:00.967 [IPControllerApp] Creating pid file: C:\Users\Administrator\.ipython\profile_default\pid\ipcontroller.pid 2016-10-10 01:14:02.102 [IPControllerApp] client::client b'\x00\x80\x00\x00)' requested 'connection_request' 2016-10-10 01:14:02.102 [IPControllerApp] client::client [b'\x00\x80\x00\x00)'] connected 2016-10-10 01:14:47.895 [IPControllerApp] client::client b'82f5efed-52eb-46f2-8c92-e713aee8a363' requested 'registration_request' 2016-10-10 01:15:05.437 [IPControllerApp] client::client b'efe6919d-98ac-4544-a6b8-9d748f28697d' requested 'registration_request' 2016-10-10 01:15:17.899 [IPControllerApp] registration::purging stalled registration: 1

和 ipengine 日志:

2016-10-10 13:44:21.037 [IPEngineApp] Registering with controller at tcp://172.17.3.14:5900 2016-10-10 13:44:21.508 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms. 2016-10-10 13:44:21.522 [IPEngineApp] Completed registration with id 1 2016-10-10 13:44:27.529 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (1 time(s) in a row). 2016-10-10 13:44:30.539 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (2 time(s) in a row). ... 2016-10-10 13:46:52.009 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (49 time(s) in a row). 2016-10-10 13:46:55.028 [IPEngineApp] WARNING | No heartbeat in the last 3010 ms (50 time(s) in a row). 2016-10-10 13:46:55.028 [IPEngineApp] CRITICAL | Maximum number of heartbeats misses reached (50 times 3010 ms), shutting down.

(本机和远程VM有12.5小时的时差)

知道为什么会发生这种情况吗?

如果您使用的是 --reuse,请确保在更改设置时删除这些文件。当给出 --reuse 并且您更改 --ip 之类的内容时,它可能表现不佳,因为连接文件可能会覆盖您的命令行参数。

设置 --ip=0.0.0.0 时,还设置 --location=a.b.c.d 可能会有用,其中 a.b.c.d 是您知道引擎可以访问的控制器的 IP 地址。更改

如果注册成功而后续连接失败,这可能是因为防火墙只打开一个端口,例如5900. 机器 运行 控制器需要打开连接文件中列出的 所有 端口。您可以通过在连接文件中手动输入端口号来将这些指定为端口范围。