无法使用覆盖网络在另一台主机上 ping docker 容器
Cannot ping docker container on another host using an overlay network
这个问题之前在所有类型的论坛上都被问过很多次,但不幸的是,到目前为止,none 个答案对我有帮助。
我会马上处理的。
OS: RHEL 7.7 Maipo
Docker Version: Engline/Client 18.09.7
My configuration:
Host 1: IP 169.192.215.74
Host 2: IP 10.210.87.16
**On Host1:**
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 3576/dockerd
tcp6 0 0 :::7946 :::* LISTEN 3576/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 3576/dockerd
# docker swarm init --advertise-addr=169.192.215.74
# docker network create --attachable --driver overlay syndichain
# docker swarm join-token manager
docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377
Ping Host 2:
# ping 10.210.87.16
PING 10.210.87.16 (10.210.87.16) 56(84) bytes of data.
64 bytes from 10.210.87.16: icmp_seq=1 ttl=119 time=0.804 ms
64 bytes from 10.210.87.16: icmp_seq=2 ttl=119 time=1.49 ms
64 bytes from 10.210.87.16: icmp_seq=3 ttl=119 time=0.646 ms
64 bytes from 10.210.87.16: icmp_seq=4 ttl=119 time=0.664 ms
^C
--- 10.210.87.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.646/0.901/1.493/0.348 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:8d:a0:60 brd ff:ff:ff:ff:ff:ff
inet 169.192.215.74/24 brd 169.192.215.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:a9:99:bf:c2 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
76: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ab:17:1e:2f brd ff:ff:ff:ff:ff:ff
inet 172.19.0.1/16 brd 172.19.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
468: veth49ae88e@if467: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 82:e1:f6:1e:72:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1
Bring up two containers (RHEL 7 container from our local repository)
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.da85 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11003:11003 --name rhel.da85.2 957c8834c8a9 bash
在主机 2 上:
# docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377 --advertise-addr=10.210.87.216
# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
yxphc861xra6ajc5llxrpp9ia * sd-cfd1-0319 Ready Active Reachable 18.09.7
2c45co3pj38u2gwqrvuawwi11 sd-d5d8-da85 Ready Active Leader 18.09.7
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 8562/dockerd
tcp6 0 0 :::7946 :::* LISTEN 8562/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 8562/dockerd
Ping Host 1:
# ping 169.192.215.74
PING 169.192.215.74 (169.192.215.74) 56(84) bytes of data.
64 bytes from 169.192.215.74: icmp_seq=1 ttl=55 time=0.796 ms
64 bytes from 169.192.215.74: icmp_seq=2 ttl=55 time=0.530 ms
64 bytes from 169.192.215.74: icmp_seq=3 ttl=55 time=0.513 ms
^C
--- 169.192.215.74 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.513/0.613/0.796/0.129 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:98:63:cb brd ff:ff:ff:ff:ff:ff
inet 10.210.87.216/23 brd 10.210.87.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:20:4a:de:3f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
871: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:f0:d9:46:4c brd ff:ff:ff:ff:ff:ff
inet 172.23.0.1/16 brd 172.23.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
885: veth4d635e9@if884: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 86:4f:5b:81:89:5c brd ff:ff:ff:ff:ff:ff link-netnsid 1
892: veth2734e0c@if891: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether b2:6c:1d:94:99:3f brd ff:ff:ff:ff:ff:ff link-netnsid 4
Bring up two RHEL7 containers on Host 2:
# docker run --rm -it --network="syndichain" --name rhel.0319 -p 11001:11001 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.0319.2 957c8834c8a9 bash
现在,我们运行一些命令来检查我们的覆盖网络:
On Host 1:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:44:01.411177608-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"fb911a7538d05f116d7f3ed50192e198d93faadaf1fb98f55fa79eb4cfbea72d": {
"Name": "rhel.da85",
"EndpointID": "8caf50c904f660c750bfa3608902d78239211e7076afed68d77a46e3e141386d",
"MacAddress": "02:42:0a:00:00:02",
"IPv4Address": "10.0.0.2/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "0a122599aee9d9b84dde6383aeb432580769af886278b46310a1c65bc68a0bc9",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
On Host 2:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:42:08.565999966-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"351d29af85e6e8addafcf69b3c464b49f8c9720cbaf8911328ab1aadce13e170": {
"Name": "rhel.0319.2",
"EndpointID": "fe0ed40c7a0b8687ca4a34d7b0196e382c32f3a5f82c43233174da349a8a5b34",
"MacAddress": "02:42:0a:00:00:04",
"IPv4Address": "10.0.0.4/24",
"IPv6Address": ""
},
"62493b5072d9c2e516a0b8bfb9d0ba6dea9c2cbca2787bf4f8c2bf5ffe021dd0": {
"Name": "rhel.0319",
"EndpointID": "9a5e8977ca434854da886c215f14583cc5a7c5bb8811d5408a5cae9c4a325c47",
"MacAddress": "02:42:0a:00:00:0e",
"IPv4Address": "10.0.0.14/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "37415feb73f86cc3ceb1ab4d060da076ed4fca1f0b3ebe440993c82c14258a2d",
"MacAddress": "02:42:0a:00:00:0f",
"IPv4Address": "10.0.0.15/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
下面我来演示一下问题:
Ping rhel.da85.2 container from rhel.da85 container (both containers on the same host)
[root@fb911a7538d0 /]# ping rhel.da85.2
PING rhel.da85.2 (10.0.0.5) 56(84) bytes of data.
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=3 ttl=64 time=0.048 ms
^C
--- rhel.da85.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.048/0.067/0.080/0.015 ms
Ping rhel.0319 container from rhel.da85 container (containers on different hosts)
[root@fb911a7538d0 /]# ping rhel.0319
PING rhel.0319 (10.0.0.14) 56(84) bytes of data.
It is stuck at this. Nothing happens.
Ping rhel.0319.2 container from rhel.0319 container (both containers on the same host)
[root@62493b5072d9 /]# ping rhel.0319.2
PING rhel.0319.2 (10.0.0.4) 56(84) bytes of data.
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=3 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=5 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=6 ttl=64 time=0.063 ms
^C
--- rhel.0319.2 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.061/0.071/0.109/0.018 ms
Ping rhel.da85 container from rhel.0319 container (both containers on different hosts)
[root@62493b5072d9 /]# ping rhel.da85
PING rhel.da85 (10.0.0.2) 56(84) bytes of data.
It is stuck at this point, just like it is stuck pinging a container on 0319 from da85.
我什至根据
明确启用了 ip4 转发
然而,在过去的一周里,这个问题仍然存在。感谢任何帮助。
更新:
有人建议 运行使用 tcpdump 和 nsenter。这是来自容器级别的 tcpdump 运行ning 的日志(在生成 PING 请求的同一虚拟机上)和 "ARP" 的 nsenter 日志。您可以看到 ARP 请求从未在 tcpdump 中得到响应,但 nsenter 显示 request/reply 对 ARP。
[root@0ae70bfa66ae /]# tcpdump -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:26:03.276397 IP (tos 0x0, ttl 64, id 34495, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 1, length 64
16:26:04.276569 IP (tos 0x0, ttl 64, id 34971, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 2, length 64
16:26:05.276526 IP (tos 0x0, ttl 64, id 35636, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 3, length 64
16:26:06.276545 IP (tos 0x0, ttl 64, id 35935, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 4, length 64
16:26:07.276542 IP (tos 0x0, ttl 64, id 36706, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 5, length 64
16:26:08.276543 IP (tos 0x0, ttl 64, id 37284, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 6, length 64
16:26:08.294475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has rhel.da85.syndichain tell rhel.0319.syndichain, length 28
# nsenter --net=/var/run/docker/netns/1-khyguu9i6n tcpdump -peni any "arp"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:45:23.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302471 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302485 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:23.302488 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294458 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294474 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294477 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294479 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302480 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302483 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302485 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
更新二:
From container on Source Host
[root@2baddaa98452 /]# nc -zvu rhel.da85 4789
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.0.2:4789.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
Source Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
11:28:11.706587 IP sd-cfd1-0319.59121 > sd-d5d8-da85.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.11.44806 > 10.0.0.2.4789: [|VXLAN]
Target Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
从您的评论来看,覆盖网络端口似乎在节点之间被阻塞。这可能来自主机上的 iptables 或其他防火墙工具、节点之间的网络防火墙,或其他软件(如 VM 工具或云路由器 ACL)阻止了这些端口。需要开启的端口为:
- TCP and UDP port 7946 for communication among nodes
- UDP port 4789 for overlay network traffic
如果您想使用覆盖网络并跨节点通信,请部署为 service, instead of as a standalone container(又名 docker run
)。
这个问题之前在所有类型的论坛上都被问过很多次,但不幸的是,到目前为止,none 个答案对我有帮助。
我会马上处理的。
OS: RHEL 7.7 Maipo
Docker Version: Engline/Client 18.09.7
My configuration:
Host 1: IP 169.192.215.74
Host 2: IP 10.210.87.16
**On Host1:**
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 3576/dockerd
tcp6 0 0 :::7946 :::* LISTEN 3576/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 3576/dockerd
# docker swarm init --advertise-addr=169.192.215.74
# docker network create --attachable --driver overlay syndichain
# docker swarm join-token manager
docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377
Ping Host 2:
# ping 10.210.87.16
PING 10.210.87.16 (10.210.87.16) 56(84) bytes of data.
64 bytes from 10.210.87.16: icmp_seq=1 ttl=119 time=0.804 ms
64 bytes from 10.210.87.16: icmp_seq=2 ttl=119 time=1.49 ms
64 bytes from 10.210.87.16: icmp_seq=3 ttl=119 time=0.646 ms
64 bytes from 10.210.87.16: icmp_seq=4 ttl=119 time=0.664 ms
^C
--- 10.210.87.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.646/0.901/1.493/0.348 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:8d:a0:60 brd ff:ff:ff:ff:ff:ff
inet 169.192.215.74/24 brd 169.192.215.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:a9:99:bf:c2 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
76: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ab:17:1e:2f brd ff:ff:ff:ff:ff:ff
inet 172.19.0.1/16 brd 172.19.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
468: veth49ae88e@if467: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 82:e1:f6:1e:72:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1
Bring up two containers (RHEL 7 container from our local repository)
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.da85 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11003:11003 --name rhel.da85.2 957c8834c8a9 bash
在主机 2 上:
# docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377 --advertise-addr=10.210.87.216
# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
yxphc861xra6ajc5llxrpp9ia * sd-cfd1-0319 Ready Active Reachable 18.09.7
2c45co3pj38u2gwqrvuawwi11 sd-d5d8-da85 Ready Active Leader 18.09.7
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 8562/dockerd
tcp6 0 0 :::7946 :::* LISTEN 8562/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 8562/dockerd
Ping Host 1:
# ping 169.192.215.74
PING 169.192.215.74 (169.192.215.74) 56(84) bytes of data.
64 bytes from 169.192.215.74: icmp_seq=1 ttl=55 time=0.796 ms
64 bytes from 169.192.215.74: icmp_seq=2 ttl=55 time=0.530 ms
64 bytes from 169.192.215.74: icmp_seq=3 ttl=55 time=0.513 ms
^C
--- 169.192.215.74 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.513/0.613/0.796/0.129 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:98:63:cb brd ff:ff:ff:ff:ff:ff
inet 10.210.87.216/23 brd 10.210.87.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:20:4a:de:3f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
871: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:f0:d9:46:4c brd ff:ff:ff:ff:ff:ff
inet 172.23.0.1/16 brd 172.23.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
885: veth4d635e9@if884: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 86:4f:5b:81:89:5c brd ff:ff:ff:ff:ff:ff link-netnsid 1
892: veth2734e0c@if891: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether b2:6c:1d:94:99:3f brd ff:ff:ff:ff:ff:ff link-netnsid 4
Bring up two RHEL7 containers on Host 2:
# docker run --rm -it --network="syndichain" --name rhel.0319 -p 11001:11001 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.0319.2 957c8834c8a9 bash
现在,我们运行一些命令来检查我们的覆盖网络:
On Host 1:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:44:01.411177608-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"fb911a7538d05f116d7f3ed50192e198d93faadaf1fb98f55fa79eb4cfbea72d": {
"Name": "rhel.da85",
"EndpointID": "8caf50c904f660c750bfa3608902d78239211e7076afed68d77a46e3e141386d",
"MacAddress": "02:42:0a:00:00:02",
"IPv4Address": "10.0.0.2/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "0a122599aee9d9b84dde6383aeb432580769af886278b46310a1c65bc68a0bc9",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
On Host 2:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:42:08.565999966-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"351d29af85e6e8addafcf69b3c464b49f8c9720cbaf8911328ab1aadce13e170": {
"Name": "rhel.0319.2",
"EndpointID": "fe0ed40c7a0b8687ca4a34d7b0196e382c32f3a5f82c43233174da349a8a5b34",
"MacAddress": "02:42:0a:00:00:04",
"IPv4Address": "10.0.0.4/24",
"IPv6Address": ""
},
"62493b5072d9c2e516a0b8bfb9d0ba6dea9c2cbca2787bf4f8c2bf5ffe021dd0": {
"Name": "rhel.0319",
"EndpointID": "9a5e8977ca434854da886c215f14583cc5a7c5bb8811d5408a5cae9c4a325c47",
"MacAddress": "02:42:0a:00:00:0e",
"IPv4Address": "10.0.0.14/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "37415feb73f86cc3ceb1ab4d060da076ed4fca1f0b3ebe440993c82c14258a2d",
"MacAddress": "02:42:0a:00:00:0f",
"IPv4Address": "10.0.0.15/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
下面我来演示一下问题:
Ping rhel.da85.2 container from rhel.da85 container (both containers on the same host)
[root@fb911a7538d0 /]# ping rhel.da85.2
PING rhel.da85.2 (10.0.0.5) 56(84) bytes of data.
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=3 ttl=64 time=0.048 ms
^C
--- rhel.da85.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.048/0.067/0.080/0.015 ms
Ping rhel.0319 container from rhel.da85 container (containers on different hosts)
[root@fb911a7538d0 /]# ping rhel.0319
PING rhel.0319 (10.0.0.14) 56(84) bytes of data.
It is stuck at this. Nothing happens.
Ping rhel.0319.2 container from rhel.0319 container (both containers on the same host)
[root@62493b5072d9 /]# ping rhel.0319.2
PING rhel.0319.2 (10.0.0.4) 56(84) bytes of data.
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=3 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=5 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=6 ttl=64 time=0.063 ms
^C
--- rhel.0319.2 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.061/0.071/0.109/0.018 ms
Ping rhel.da85 container from rhel.0319 container (both containers on different hosts)
[root@62493b5072d9 /]# ping rhel.da85
PING rhel.da85 (10.0.0.2) 56(84) bytes of data.
It is stuck at this point, just like it is stuck pinging a container on 0319 from da85.
我什至根据
然而,在过去的一周里,这个问题仍然存在。感谢任何帮助。
更新:
有人建议 运行使用 tcpdump 和 nsenter。这是来自容器级别的 tcpdump 运行ning 的日志(在生成 PING 请求的同一虚拟机上)和 "ARP" 的 nsenter 日志。您可以看到 ARP 请求从未在 tcpdump 中得到响应,但 nsenter 显示 request/reply 对 ARP。
[root@0ae70bfa66ae /]# tcpdump -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:26:03.276397 IP (tos 0x0, ttl 64, id 34495, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 1, length 64
16:26:04.276569 IP (tos 0x0, ttl 64, id 34971, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 2, length 64
16:26:05.276526 IP (tos 0x0, ttl 64, id 35636, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 3, length 64
16:26:06.276545 IP (tos 0x0, ttl 64, id 35935, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 4, length 64
16:26:07.276542 IP (tos 0x0, ttl 64, id 36706, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 5, length 64
16:26:08.276543 IP (tos 0x0, ttl 64, id 37284, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 6, length 64
16:26:08.294475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has rhel.da85.syndichain tell rhel.0319.syndichain, length 28
# nsenter --net=/var/run/docker/netns/1-khyguu9i6n tcpdump -peni any "arp"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:45:23.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302471 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302485 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:23.302488 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294458 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294474 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294477 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294479 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302480 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302483 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302485 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
更新二:
From container on Source Host
[root@2baddaa98452 /]# nc -zvu rhel.da85 4789
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.0.2:4789.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
Source Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
11:28:11.706587 IP sd-cfd1-0319.59121 > sd-d5d8-da85.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.11.44806 > 10.0.0.2.4789: [|VXLAN]
Target Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
从您的评论来看,覆盖网络端口似乎在节点之间被阻塞。这可能来自主机上的 iptables 或其他防火墙工具、节点之间的网络防火墙,或其他软件(如 VM 工具或云路由器 ACL)阻止了这些端口。需要开启的端口为:
- TCP and UDP port 7946 for communication among nodes
- UDP port 4789 for overlay network traffic
如果您想使用覆盖网络并跨节点通信,请部署为 service, instead of as a standalone container(又名 docker run
)。