加入位于不同 Docker 容器中的农奴节点时出现问题

Issue when joining serf nodes located in different Docker containers

上下文:主机是 AWS-EC2 / Ubuntu 14.04.5,Docker 版本 17.05.0-ce。容器是从公开可用的 repo 图像 cbhihe/serf-alpine-bash 构建的。所有容器都位于同一个 EC2 实例上,并与 net-interface "docker0".

共享相同的默认桥接网络

尝试加入节点 serfDC1(id d4fd90692e18)和 serfDC2(id 6353e7f6134d),通过从主机 shell:

传递命令
$ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946
==> Starting Serf agent…
==> Starting Serf agent RPC...
==> Serf agent running!
         Node name: 'd4fd90692e18'
         Bind addr: '0.0.0.0:7946'
          RPC addr: '127.0.0.1:7373'
         Encrypted: false
          Snapshot: false
           Profile: lan
==> Log data will now stream in as it occurs:
    2017/06/04 00:01:10 [INFO] agent: Serf agent starting
    2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
    2017/06/04 00:01:11 [INFO] agent: Received event: member-join
    ^C

发现Node1的容器IP=172.17.0.4后,可以向Node2发出serf agent -join命令:

$ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4
==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
         Node name: '6353e7f6134d'
         Bind addr: '0.0.0.0:7946'
          RPC addr: '127.0.0.1:7373'
         Encrypted: false
          Snapshot: false
           Profile: lan
==> Joining cluster...(replay: false)
    Join completed. Synced with 1 initial agents
==> Log data will now stream in as it occurs:
    2017/06/04 00:18:35 [INFO] agent: Serf agent starting
    2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1
    2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false
    2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
    2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes
    2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:36 [INFO] agent: Received event: member-join
    2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876
    2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879
    2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881
    2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations)
    2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1
    2017/06/04 00:18:43 [INFO] agent: Received event: member-failed
    2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946
   ^C

导致加入失败如图所示:

$ docker exec serfDC2 serf members
6353e7f6134d  127.0.0.1:7946  alive
d4fd90692e18  127.0.0.1:7946  failed  
$ docker exec serfDC1 serf members
d4fd90692e18  127.0.0.1:7946  alive 
6353e7f6134d  127.0.0.1:7946  failed

我已经这样做了很长一段时间了,对于我应该转向哪里我已经束手无策了。 Hashicorp 和 Docker 的文档似乎没有涵盖不同容器中两个农奴代理之间初始握手的这一方面。

有人可以告诉我我在哪里转错了吗?任何答案都很好,真的。发送.

Serf 节点需要 'announce' 给自己一个可路由的地址。在你的情况下,他们互相告诉对方:'嗨,我是本地主机:...',所以每个人都试图回答本地主机,这是错误的,因为每个容器都有自己的本地主机。

有一个选项可以将代理配置为使用 eth0 ip 向网络中的其他节点通告:-iface。然后你需要放弃 -bind 选项。这些端口是默认端口,因此无需自定义。

因此,对于节点 1:

serf agent -node=Node1 -iface=eth0

对于节点 2:

serf agent -node=Node2 -join=172.17.0.2 -iface=eth0

来自docs

-iface - This flag can be used to provide a binding interface. It can be used instead of -bind if the interface is known but not the address.

它对我来说工作正常:

节点 1:

==> Log data will now stream in as it occurs:

    2017/06/04 01:56:40 [INFO] agent: Serf agent starting
    2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
    2017/06/04 01:56:41 [INFO] agent: Received event: member-join
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
    2017/06/04 01:57:03 [INFO] agent: Received event: member-join

节点 2:

==> Log data will now stream in as it occurs:

    2017/06/04 01:57:02 [INFO] agent: Serf agent starting
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
    2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
    2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes
    2017/06/04 01:57:03 [INFO] agent: Received event: member-join

编辑:

如果每个容器都在自己的 VM(EC2 实例)中,因为每个实例都有自己的 docker 网络并且没有互连,您必须提供 EC2 实例 IP 并公开相应的端口.使用 -advertise

-advertise - The advertise flag is used to change the address that we advertise to other nodes in the cluster.

节点 1:

serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP

节点 2:

serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0

并且记得docker run

中公开农奴端口
docker run -p 7946:7946 (...rest of the command...)