加入位于不同 Docker 容器中的农奴节点时出现问题
Issue when joining serf nodes located in different Docker containers
上下文:主机是 AWS-EC2 / Ubuntu 14.04.5,Docker 版本 17.05.0-ce。容器是从公开可用的 repo 图像 cbhihe/serf-alpine-bash
构建的。所有容器都位于同一个 EC2 实例上,并与 net-interface "docker0".
共享相同的默认桥接网络
尝试加入节点 serfDC1(id d4fd90692e18)和 serfDC2(id 6353e7f6134d),通过从主机 shell:
传递命令
$ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946
==> Starting Serf agent…
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: 'd4fd90692e18'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Log data will now stream in as it occurs:
2017/06/04 00:01:10 [INFO] agent: Serf agent starting
2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:01:11 [INFO] agent: Received event: member-join
^C
发现Node1的容器IP=172.17.0.4后,可以向Node2发出serf agent -join
命令:
$ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4
==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: '6353e7f6134d'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Joining cluster...(replay: false)
Join completed. Synced with 1 initial agents
==> Log data will now stream in as it occurs:
2017/06/04 00:18:35 [INFO] agent: Serf agent starting
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes
2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:36 [INFO] agent: Received event: member-join
2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876
2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879
2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881
2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations)
2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1
2017/06/04 00:18:43 [INFO] agent: Received event: member-failed
2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946
^C
导致加入失败如图所示:
$ docker exec serfDC2 serf members
6353e7f6134d 127.0.0.1:7946 alive
d4fd90692e18 127.0.0.1:7946 failed
$ docker exec serfDC1 serf members
d4fd90692e18 127.0.0.1:7946 alive
6353e7f6134d 127.0.0.1:7946 failed
我已经这样做了很长一段时间了,对于我应该转向哪里我已经束手无策了。 Hashicorp 和 Docker 的文档似乎没有涵盖不同容器中两个农奴代理之间初始握手的这一方面。
有人可以告诉我我在哪里转错了吗?任何答案都很好,真的。发送.
Serf 节点需要 'announce' 给自己一个可路由的地址。在你的情况下,他们互相告诉对方:'嗨,我是本地主机:...',所以每个人都试图回答本地主机,这是错误的,因为每个容器都有自己的本地主机。
有一个选项可以将代理配置为使用 eth0
ip 向网络中的其他节点通告:-iface
。然后你需要放弃 -bind
选项。这些端口是默认端口,因此无需自定义。
因此,对于节点 1:
serf agent -node=Node1 -iface=eth0
对于节点 2:
serf agent -node=Node2 -join=172.17.0.2 -iface=eth0
来自docs:
-iface - This flag can be used to provide a binding interface. It can be used instead of -bind if the interface is known but not the address.
它对我来说工作正常:
节点 1:
==> Log data will now stream in as it occurs:
2017/06/04 01:56:40 [INFO] agent: Serf agent starting
2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:56:41 [INFO] agent: Received event: member-join
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
节点 2:
==> Log data will now stream in as it occurs:
2017/06/04 01:57:02 [INFO] agent: Serf agent starting
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
编辑:
如果每个容器都在自己的 VM(EC2 实例)中,因为每个实例都有自己的 docker 网络并且没有互连,您必须提供 EC2 实例 IP 并公开相应的端口.使用 -advertise
-advertise - The advertise flag is used to change the address that we advertise to other nodes in the cluster.
节点 1:
serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP
节点 2:
serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0
并且记得在docker run
中公开农奴端口
docker run -p 7946:7946 (...rest of the command...)
上下文:主机是 AWS-EC2 / Ubuntu 14.04.5,Docker 版本 17.05.0-ce。容器是从公开可用的 repo 图像 cbhihe/serf-alpine-bash
构建的。所有容器都位于同一个 EC2 实例上,并与 net-interface "docker0".
尝试加入节点 serfDC1(id d4fd90692e18)和 serfDC2(id 6353e7f6134d),通过从主机 shell:
传递命令$ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946
==> Starting Serf agent…
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: 'd4fd90692e18'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Log data will now stream in as it occurs:
2017/06/04 00:01:10 [INFO] agent: Serf agent starting
2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:01:11 [INFO] agent: Received event: member-join
^C
发现Node1的容器IP=172.17.0.4后,可以向Node2发出serf agent -join
命令:
$ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4
==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: '6353e7f6134d'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Joining cluster...(replay: false)
Join completed. Synced with 1 initial agents
==> Log data will now stream in as it occurs:
2017/06/04 00:18:35 [INFO] agent: Serf agent starting
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes
2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:36 [INFO] agent: Received event: member-join
2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876
2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879
2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881
2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations)
2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1
2017/06/04 00:18:43 [INFO] agent: Received event: member-failed
2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946
^C
导致加入失败如图所示:
$ docker exec serfDC2 serf members
6353e7f6134d 127.0.0.1:7946 alive
d4fd90692e18 127.0.0.1:7946 failed
$ docker exec serfDC1 serf members
d4fd90692e18 127.0.0.1:7946 alive
6353e7f6134d 127.0.0.1:7946 failed
我已经这样做了很长一段时间了,对于我应该转向哪里我已经束手无策了。 Hashicorp 和 Docker 的文档似乎没有涵盖不同容器中两个农奴代理之间初始握手的这一方面。
有人可以告诉我我在哪里转错了吗?任何答案都很好,真的。发送.
Serf 节点需要 'announce' 给自己一个可路由的地址。在你的情况下,他们互相告诉对方:'嗨,我是本地主机:...',所以每个人都试图回答本地主机,这是错误的,因为每个容器都有自己的本地主机。
有一个选项可以将代理配置为使用 eth0
ip 向网络中的其他节点通告:-iface
。然后你需要放弃 -bind
选项。这些端口是默认端口,因此无需自定义。
因此,对于节点 1:
serf agent -node=Node1 -iface=eth0
对于节点 2:
serf agent -node=Node2 -join=172.17.0.2 -iface=eth0
来自docs:
-iface - This flag can be used to provide a binding interface. It can be used instead of -bind if the interface is known but not the address.
它对我来说工作正常:
节点 1:
==> Log data will now stream in as it occurs:
2017/06/04 01:56:40 [INFO] agent: Serf agent starting
2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:56:41 [INFO] agent: Received event: member-join
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
节点 2:
==> Log data will now stream in as it occurs:
2017/06/04 01:57:02 [INFO] agent: Serf agent starting
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
编辑:
如果每个容器都在自己的 VM(EC2 实例)中,因为每个实例都有自己的 docker 网络并且没有互连,您必须提供 EC2 实例 IP 并公开相应的端口.使用 -advertise
-advertise - The advertise flag is used to change the address that we advertise to other nodes in the cluster.
节点 1:
serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP
节点 2:
serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0
并且记得在docker run
docker run -p 7946:7946 (...rest of the command...)