无法在 swarm 模式下从覆盖网络中跨节点的服务访问端口

Unable to access ports from services across nodes in overlay network in swarm mode

我使用以下组合文件进行堆栈部署

version: '3.8'
x-deploy: &Deploy
  replicas: 1
  placement: &DeployPlacement
    max_replicas_per_node: 1
  restart_policy:
    max_attempts: 15
    window: 60s
  resources: &DeployResources
    reservations: &DeployResourcesReservations
      cpus: '0.05'
      memory: 10M
services:
  serv1:
    image: alpine
    networks:
      - test_nw
    deploy:
      <<: *Deploy
    entrypoint: ["tail", "-f", "/dev/null"]
  serv2:
    image: nginx
    networks:
      - test_nw
    deploy:
      <<: *Deploy
      placement:
        <<: *DeployPlacement
        constraints:
          - "node.role!=manager"
    expose: # deprecated, but I leave it here anyway
      - "80"
networks:
  test_nw:
    name: test_nw
    driver: overlay

为了方便起见,我将在host1中使用test_serv1 运行 via containertest_serv2 运行 via container2host2 中用于此端口的其余部分,因为实际的主机名和容器名称对我来说一直在变化。

当我进入 test_serv1 的 shell 时,当我 ping serv2 时会发生以下情况:

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms

然而,检查container2container2的ip是10.0.7.6

ubuntu@host2:~$ sudo docker inspect test_serv2.1.container2
[
    {
****************
        "NetworkSettings": {
            "Bridge": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": null
            },
****************
            "Networks": {
                "test_nw": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.7.6"
                    },
                    "Links": null,
                    "Aliases": [
                        "80c06bb29a42"
                    ],
                    "NetworkID": "sp56aiqxnt56yglsd8mc1zqpv",
                    "EndpointID": "dac52f1d7fa148f5acac20f89d6b709193b3c11fc90201424cd052785121e706",
                    "Gateway": "",
                    "IPAddress": "10.0.7.6",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:07:06",
****************
            }
        }
    }
]

我可以看到 container2 正在侦听所有接口上的端口 80,它本身可以 ping 10.0.7.5 和 10.0.7.6 (!!),并且可以访问两个 ips 上的端口 80 (! !).

ubuntu@host2:~$ sudo docker exec -it test_serv2.1.container2 bash
root@80c06bb29a42:/# ping 10.0.7.5
PING 10.0.7.5 (10.0.7.5) 56(84) bytes of data.
64 bytes from 10.0.7.5: icmp_seq=1 ttl=64 time=0.093 ms
64 bytes from 10.0.7.5: icmp_seq=2 ttl=64 time=0.094 ms
^C
--- 10.0.7.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 0.093/0.093/0.094/0.009 ms
root@80c06bb29a42:/# ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6) 56(84) bytes of data.
64 bytes from 10.0.7.6: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 10.0.7.6: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 10.0.7.6: icmp_seq=3 ttl=64 time=0.053 ms
^C
--- 10.0.7.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 50ms
rtt min/avg/max/mdev = 0.035/0.049/0.059/0.010 ms
root@80c06bb29a42:/# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          33110      1/nginx: master pro 
tcp        0      0 127.0.0.11:35491        0.0.0.0:*               LISTEN      0          32855      -                   
tcp6       0      0 :::80                   :::*                    LISTEN      0          33111      1/nginx: master pro 
udp        0      0 127.0.0.11:43477        0.0.0.0:*                           0          32854      -                   
root@80c06bb29a42:/# curl 10.0.7.5:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# curl 10.0.7.6:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# 

但是,当我尝试 container1 中的以下操作时,我只想把我的笔记本电脑扔到墙上,因为我无法弄清楚为什么没有其他人遇到过这样的问题 and/or 已发布这样的问题:/

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.086 ms
^C
--- serv2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.084/0.085/0.086 ms
/ # curl serv2:80
^C
/ # curl --max-time 10 serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping test_serv2
PING test_serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.071 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.064 ms
64 bytes from 10.0.7.5: seq=2 ttl=64 time=0.125 ms
^C
--- test_serv2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.064/0.086/0.125 ms
/ # curl --max-time 10 test_serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6): 56 data bytes
^C
--- 10.0.7.6 ping statistics ---
87 packets transmitted, 0 packets received, 100% packet loss
/ # curl --max-time 10 10.0.7.6:80
curl: (28) Connection timed out after 10001 milliseconds
/ # 

我已检查所有 docker 端口(TCP 2376、2377、7946、80 和 UDP 7946、4789)在两个节点上都已打开。

这是怎么回事??非常感谢任何帮助!

我将此发布给可能会来看的人,因为还没有答案。

需要考虑的几件事(尽管问题中都提到了):

  1. 请确保所有端口再次打开。即使你设置过一次,也要彻底检查 iptables。 Docker 引擎似乎更改了配置,如果您在 docker 启动后打开端口,有时会使它处于无法使用的状态(重新启动不会修复它,您需要硬停止 -> 重置 iptables -> 开始 docker ce)
  2. 确保您机器的本地 IP 地址没有冲突。这是一件大事。虽然我无法描述,但你可以尝试了解各种类 IP,看看是否有任何冲突。
  3. 可能是最琐碎但几乎总是被排除在外的指令:记住始终使用 --advertise-addr--listen-addr 初始化或加入集群。 --advertise-addr 应该是一个 public-facing IP 地址(即使不面向互联网,它也是其他主机用来访问该主机的 IP 地址)。 --listen-addr 的文档不够详细,但这必须是 docker 应该绑定到的接口的 IP。

完成上述操作后,请注意 AWS Ec2 不能很好地与 cross-provider 主机配合使用。如果您的机器分布在多个提供商(例如 IBM、Azure、GCP 等)中,Ec2 会在那里播放 spoil-sport。我很好奇它是如何完成的(必须是低级别的网络侵权),但我花了相当多的时间试图让它工作但它不会。