Swarm 模式路由网格不工作,默认情况下像主机模式一样工作
Swarm mode routing mesh not working, instead is working like host mode by default
描述
Swarm 模式路由网格不工作,相反,它在默认情况下像使用主机模式一样工作。
我们正在部署一个由 3 个主节点和 8 个工作节点组成的集群,每个节点都在使用 Terraform 和 Ansible 的云服务 OpenStack 的不同实例中。集群和路由网格运行良好,因为它停止工作并开始像主机模式一样工作。我们没有改变任何东西,也没有做任何更新或部署新服务。我们尝试重新启动 swarm 并重新部署 swarm 和所有服务,但没有任何效果,我们无法使其再次在路由网格模式下工作。因此,我们决定销毁所有实例并从头开始(下面报告的问题)。我们像以前一样对 Ubuntu 18.04 LTS 和 docker 进行了全新安装。然后我们设置 1 个主节点和 2 个工作节点(这次是手动)并部署一个服务,但是 swarm 仍然像主机模式一样工作。
访问服务的唯一方法是通过运行所在节点的IP地址,否则没有应答(超时)。我们尝试使用 manager 或其他 worker 实例的 IP 进行访问,但无法访问该服务。这就是为什么我们假设 swarm 默认使用主机模式而不是入口网络和路由网格。
我们也尝试过不同的服务,如 Mongo 或 Cassandra,但行为是一样的,群看起来像使用主机模式工作。您只能使用服务所在的实例IP地址访问服务运行.
关于如何绕过主机并返回路由网格有什么想法吗?
我们只想使用应该处于 Drain 模式的管理器节点的 IP 地址来访问任何服务。
重现问题的步骤:
- [经理]
sudo docker swarm init --advertise-addr 158.39.201.14
- [worker-0]
sudo docker swarm join --token SWMTKN-1-3np0cy0msnfurecckl4863hkftykuqkgeq998s1hix6jsoiarq-758o52hyma
iyzv74w3u1yzltt 158.39.201.14:2377
- [worker-1]
sudo docker swarm join --token SWMTKN-1-3np0cy0msnfurecckl4863hkftykuqkgeq998s1hix6jsoiarq-758o52hyma
iyzv74w3u1yzltt 158.39.201.14:2377
- [manger] sudo docker stack deploy -c docker-compose.yml nh
描述您收到的结果:
curl http://[worker-0-ip]:8089/bigdata 200 OK
curl http://[worker-1-ip]:8089/bigdata 失败超时
描述您期望的结果:
curl http://[worker-0-ip]:8089/bigdata 200 OK
curl http://[worker-1-ip]:8089/bigdata 200 OK
您认为重要的其他信息(例如,问题只是偶尔发生):
这个问题 2 天前没有发生,突然间开始发生。我们没有做任何修改,也没有接触服务器。
docker-compose.yml
version: '3.7'
networks:
news-hunter:
name: &network news-hunter
x-network: &network-base
networks:
- *network
services:
blazegraph:
<<: *network-base
image: lyrasis/blazegraph:2.1.5
ports:
- published: 8089
target: 8080
deploy:
placement:
constraints:
- node.role == worker
manager、worker-1、worker-2的IP表(都一样):sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER-USER all -- anywhere anywhere
DOCKER-INGRESS all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
DROP all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain DOCKER (2 references)
target prot opt source destination
Chain DOCKER-INGRESS (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere anywhere tcp dpt:8089
ACCEPT tcp -- anywhere anywhere state RELATED,ESTABLISHED tcp spt:8089
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all -- anywhere anywhere
DROP all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
管理器端口:sudo netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 46731 14980/systemd-resol
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 17752 865/sshd
tcp6 0 0 :::22 :::* LISTEN 0 17757 865/sshd
tcp6 0 0 :::8089 :::* LISTEN 0 306971 24992/dockerd
tcp6 0 0 :::2377 :::* LISTEN 0 301970 24992/dockerd
tcp6 0 0 :::7946 :::* LISTEN 0 301986 24992/dockerd
udp 0 0 127.0.0.53:53 0.0.0.0:* 101 46730 14980/systemd-resol
udp 0 0 158.39.201.14:68 0.0.0.0:* 100 46591 14964/systemd-netwo
udp 0 0 0.0.0.0:4789 0.0.0.0:* 0 302125 -
udp6 0 0 fe80::f816:3eff:fef:546 :::* 100 46586 14964/systemd-netwo
udp6 0 0 :::7946 :::* 0 301987 24992/dockerd
工作端口:sudo netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 44998 15283/systemd-resol
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 15724 1010/sshd
tcp6 0 0 :::22 :::* LISTEN 0 15726 1010/sshd
tcp6 0 0 :::8089 :::* LISTEN 0 300227 25355/dockerd
tcp6 0 0 :::7946 :::* LISTEN 0 283636 25355/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* 0 285465 -
udp 0 0 127.0.0.53:53 0.0.0.0:* 101 44997 15283/systemd-resol
udp 0 0 158.39.201.15:68 0.0.0.0:* 100 233705 15247/systemd-netwo
udp6 0 0 :::7946 :::* 0 283637 25355/dockerd
udp6 0 0 fe80::f816:3eff:fee:546 :::* 100 48229 15247/systemd-netwo
服务运行:sudo docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
m7eha88ff4wm nh_blazegraph replicated 1/1 lyrasis/blazegraph:2.1.5 *:8089->8080/tcp
堆栈:sudo docker stack ps nh
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
tqkd9t4i03yt nh_blazegraph.1 lyrasis/blazegraph:2.1.5 nh-worker-0 Running Running 3 hours ago
docker version
的输出:
Client: Docker Engine - Community
Version: 19.03.6
API version: 1.40
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:27:49 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:26:21 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker info
的输出:
Client:
Debug Mode: false
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 19.03.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: hpcm67vxrmkm1wvlhfqbjevox
Is Manager: true
ClusterID: gnl96swlf7o3a976oarvjrazt
Managers: 1
Nodes: 3
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 158.39.201.14
Manager Addresses:
158.39.201.14:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-74-generic
Operating System: Ubuntu 18.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.852GiB
Name: nh-manager-0
ID: PHBO:E6UZ:RNJL:5LVU:OZXW:FM5M:GQVW:SCAQ:EEQW:7IIW:GARL:AUHI
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
服务检查:sudo docker service inspect --pretty nh_blazegraph
ID: ef9s5lesysovh5x2653qc6dk9
Name: nh_blazegraph
Labels:
com.docker.stack.image=lyrasis/blazegraph:2.1.5
com.docker.stack.namespace=nh
Service Mode: Replicated
Replicas: 1
Placement:
Constraints: [node.role == worker]
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
Image: lyrasis/blazegraph:2.1.5@sha256:e9fb46c9d7b2fc23202945a3d71b99ad8df2d7a18dcbcccc04cfc4f791b569e9
Resources:
Networks: news-hunter
Endpoint Mode: vip
Ports:
PublishedPort = 8089
Protocol = tcp
TargetPort = 8080
PublishMode = ingress
其他环境详细信息(AWS、VirtualBox、物理等):
我们正在开发 OpenStack IaaS 云提供商。
Out 工作负载预计每分钟来自外部源的 HTTP 请求超过 1000 个,节点之间的请求超过 5000 个。
交叉发布:
https://forums.docker.com/t/swarm-mode-routing-mesh-not-working-instead-is-using-host-mode-by-default/89731
https://github.com/moby/moby/issues/40590
这表明 vxlan 的覆盖端口在集群中的节点之间被阻塞。 vxlan使用的端口为:
- TCP and UDP port 7946 for communication among nodes
- UDP port 4789 for overlay network traffic
来源:https://docs.docker.com/network/overlay/
显示的 iptables 表明这不是在 Linux 主机本身内完成的(输入和输出策略配置为默认允许),所以我会查看用于 运行 虚拟机。例如。 VMware NSX 使用这些端口并因此阻止了嵌入式 VM。
毫无疑问,问题出在 UDP 端口 4789 上。出于某种奇怪的原因,它被我们基于 OpenStack 的 IaaS 云提供商屏蔽了。我们没能知道原因。
但解决方案是通过添加选项 --data-path-port
来更改用于容器入口网络的端口 4789 UDP,正如@BMitch 在 :
docker swarm init < MANAGER-IP > --data-path-port 5789
描述
Swarm 模式路由网格不工作,相反,它在默认情况下像使用主机模式一样工作。
我们正在部署一个由 3 个主节点和 8 个工作节点组成的集群,每个节点都在使用 Terraform 和 Ansible 的云服务 OpenStack 的不同实例中。集群和路由网格运行良好,因为它停止工作并开始像主机模式一样工作。我们没有改变任何东西,也没有做任何更新或部署新服务。我们尝试重新启动 swarm 并重新部署 swarm 和所有服务,但没有任何效果,我们无法使其再次在路由网格模式下工作。因此,我们决定销毁所有实例并从头开始(下面报告的问题)。我们像以前一样对 Ubuntu 18.04 LTS 和 docker 进行了全新安装。然后我们设置 1 个主节点和 2 个工作节点(这次是手动)并部署一个服务,但是 swarm 仍然像主机模式一样工作。
访问服务的唯一方法是通过运行所在节点的IP地址,否则没有应答(超时)。我们尝试使用 manager 或其他 worker 实例的 IP 进行访问,但无法访问该服务。这就是为什么我们假设 swarm 默认使用主机模式而不是入口网络和路由网格。
我们也尝试过不同的服务,如 Mongo 或 Cassandra,但行为是一样的,群看起来像使用主机模式工作。您只能使用服务所在的实例IP地址访问服务运行.
关于如何绕过主机并返回路由网格有什么想法吗? 我们只想使用应该处于 Drain 模式的管理器节点的 IP 地址来访问任何服务。
重现问题的步骤:
- [经理]
sudo docker swarm init --advertise-addr 158.39.201.14
- [worker-0]
sudo docker swarm join --token SWMTKN-1-3np0cy0msnfurecckl4863hkftykuqkgeq998s1hix6jsoiarq-758o52hyma iyzv74w3u1yzltt 158.39.201.14:2377
- [worker-1]
sudo docker swarm join --token SWMTKN-1-3np0cy0msnfurecckl4863hkftykuqkgeq998s1hix6jsoiarq-758o52hyma iyzv74w3u1yzltt 158.39.201.14:2377
- [manger] sudo docker stack deploy -c docker-compose.yml nh
描述您收到的结果:
curl http://[worker-0-ip]:8089/bigdata 200 OK
curl http://[worker-1-ip]:8089/bigdata 失败超时
描述您期望的结果:
curl http://[worker-0-ip]:8089/bigdata 200 OK
curl http://[worker-1-ip]:8089/bigdata 200 OK
您认为重要的其他信息(例如,问题只是偶尔发生):
这个问题 2 天前没有发生,突然间开始发生。我们没有做任何修改,也没有接触服务器。
docker-compose.yml
version: '3.7'
networks:
news-hunter:
name: &network news-hunter
x-network: &network-base
networks:
- *network
services:
blazegraph:
<<: *network-base
image: lyrasis/blazegraph:2.1.5
ports:
- published: 8089
target: 8080
deploy:
placement:
constraints:
- node.role == worker
manager、worker-1、worker-2的IP表(都一样):sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER-USER all -- anywhere anywhere
DOCKER-INGRESS all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
DROP all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain DOCKER (2 references)
target prot opt source destination
Chain DOCKER-INGRESS (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere anywhere tcp dpt:8089
ACCEPT tcp -- anywhere anywhere state RELATED,ESTABLISHED tcp spt:8089
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all -- anywhere anywhere
DROP all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
管理器端口:sudo netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 46731 14980/systemd-resol
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 17752 865/sshd
tcp6 0 0 :::22 :::* LISTEN 0 17757 865/sshd
tcp6 0 0 :::8089 :::* LISTEN 0 306971 24992/dockerd
tcp6 0 0 :::2377 :::* LISTEN 0 301970 24992/dockerd
tcp6 0 0 :::7946 :::* LISTEN 0 301986 24992/dockerd
udp 0 0 127.0.0.53:53 0.0.0.0:* 101 46730 14980/systemd-resol
udp 0 0 158.39.201.14:68 0.0.0.0:* 100 46591 14964/systemd-netwo
udp 0 0 0.0.0.0:4789 0.0.0.0:* 0 302125 -
udp6 0 0 fe80::f816:3eff:fef:546 :::* 100 46586 14964/systemd-netwo
udp6 0 0 :::7946 :::* 0 301987 24992/dockerd
工作端口:sudo netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 44998 15283/systemd-resol
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 15724 1010/sshd
tcp6 0 0 :::22 :::* LISTEN 0 15726 1010/sshd
tcp6 0 0 :::8089 :::* LISTEN 0 300227 25355/dockerd
tcp6 0 0 :::7946 :::* LISTEN 0 283636 25355/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* 0 285465 -
udp 0 0 127.0.0.53:53 0.0.0.0:* 101 44997 15283/systemd-resol
udp 0 0 158.39.201.15:68 0.0.0.0:* 100 233705 15247/systemd-netwo
udp6 0 0 :::7946 :::* 0 283637 25355/dockerd
udp6 0 0 fe80::f816:3eff:fee:546 :::* 100 48229 15247/systemd-netwo
服务运行:sudo docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
m7eha88ff4wm nh_blazegraph replicated 1/1 lyrasis/blazegraph:2.1.5 *:8089->8080/tcp
堆栈:sudo docker stack ps nh
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
tqkd9t4i03yt nh_blazegraph.1 lyrasis/blazegraph:2.1.5 nh-worker-0 Running Running 3 hours ago
docker version
的输出:
Client: Docker Engine - Community
Version: 19.03.6
API version: 1.40
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:27:49 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:26:21 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker info
的输出:
Client:
Debug Mode: false
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 19.03.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: hpcm67vxrmkm1wvlhfqbjevox
Is Manager: true
ClusterID: gnl96swlf7o3a976oarvjrazt
Managers: 1
Nodes: 3
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 158.39.201.14
Manager Addresses:
158.39.201.14:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-74-generic
Operating System: Ubuntu 18.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.852GiB
Name: nh-manager-0
ID: PHBO:E6UZ:RNJL:5LVU:OZXW:FM5M:GQVW:SCAQ:EEQW:7IIW:GARL:AUHI
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
服务检查:sudo docker service inspect --pretty nh_blazegraph
ID: ef9s5lesysovh5x2653qc6dk9
Name: nh_blazegraph
Labels:
com.docker.stack.image=lyrasis/blazegraph:2.1.5
com.docker.stack.namespace=nh
Service Mode: Replicated
Replicas: 1
Placement:
Constraints: [node.role == worker]
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
Image: lyrasis/blazegraph:2.1.5@sha256:e9fb46c9d7b2fc23202945a3d71b99ad8df2d7a18dcbcccc04cfc4f791b569e9
Resources:
Networks: news-hunter
Endpoint Mode: vip
Ports:
PublishedPort = 8089
Protocol = tcp
TargetPort = 8080
PublishMode = ingress
其他环境详细信息(AWS、VirtualBox、物理等):
我们正在开发 OpenStack IaaS 云提供商。 Out 工作负载预计每分钟来自外部源的 HTTP 请求超过 1000 个,节点之间的请求超过 5000 个。
交叉发布:
https://forums.docker.com/t/swarm-mode-routing-mesh-not-working-instead-is-using-host-mode-by-default/89731 https://github.com/moby/moby/issues/40590
这表明 vxlan 的覆盖端口在集群中的节点之间被阻塞。 vxlan使用的端口为:
- TCP and UDP port 7946 for communication among nodes
- UDP port 4789 for overlay network traffic
来源:https://docs.docker.com/network/overlay/
显示的 iptables 表明这不是在 Linux 主机本身内完成的(输入和输出策略配置为默认允许),所以我会查看用于 运行 虚拟机。例如。 VMware NSX 使用这些端口并因此阻止了嵌入式 VM。
毫无疑问,问题出在 UDP 端口 4789 上。出于某种奇怪的原因,它被我们基于 OpenStack 的 IaaS 云提供商屏蔽了。我们没能知道原因。
但解决方案是通过添加选项 --data-path-port
来更改用于容器入口网络的端口 4789 UDP,正如@BMitch 在
docker swarm init < MANAGER-IP > --data-path-port 5789