Docker Swarm 工作节点未访问互联网
Docker Swarm worker node not accessing the internet
我已经初始化了一个有 1 名经理和 1 名工作人员的群,每个人都在不同的主机上,遵循 official tutorial. I also use Traefik, following these instructions on dockerswarm.rocks,使用简单的覆盖网络创建:
docker network create --driver=overlay traefik-public
现在部署我的一个服务,需要上网
虽然在 管理器节点 上部署服务时效果很好,但在 工作节点 上失败了。
docker-compose.yml
version: '3.5'
services:
export-phyc:
image: my.docker.registry/my/image
networks:
- traefik-public
deploy:
labels:
- traefik.enable=true
- traefik.docker.network=traefik-public
- traefik.constraint-label=traefik-public
- traefik.http.routers.myservice-http.rule=Host(`my.domain`)
- traefik.http.routers.myservice-http.entrypoints=http
- traefik.http.routers.myservice-http.middlewares=https-redirect
- traefik.http.routers.myservice-https.rule=Host(`my.domain`)
- traefik.http.routers.myservice-https.entrypoints=https
- traefik.http.routers.myservice-https.tls=true
- traefik.http.routers.myservice-https.tls.certresolver=le
- traefik.http.services.myservice.loadbalancer.server.port=80
networks:
traefik-public:
external: true
两台主机具有相同的 DNS 配置:
# cat /etc/resolv.conf
domain openstacklocal
search openstacklocal
nameserver 213.186.xx.xx
两个任务也有相同的 DNS 配置(但与主机不同):
# docker container exec <my-container-id> cat /etc/resolv.conf
search openstacklocal
nameserver 127.0.0.xx
options ndots:0
然而,经理上的任务可以访问互联网:
# docker container exec <my-container-id> wget google.com
Connecting to google.com (216.58.215.46:80)
Connecting to www.google.com (216.58.206.228:80)
saving to 'index.html'
index.html 100% |********************************| 13848 0:00:00 ETA
'index.html' saved
工作人员上的任务不能:
# docker container exec <my-container-id> wget google.com
wget: bad address 'google.com'
# docker container exec <my-container-id> wget 216.58.204.142
Connecting to 216.58.204.142 (216.58.204.142:80)
wget: can't connect to remote host (216.58.204.142): Operation timed out
我最糊涂了。如何让工作节点上的任务访问互联网?
所以问题出在我的防火墙 (iptables) 乱用 Docker 设置的规则。我确实需要实施自己的规则(在重新启动时启动),并且 Docker 必须设置其内部通信规则(每次 docker 守护程序重新启动时设置)。
我不是 iptables 的行家,我只是得到了一个应该可以很好地处理 Docker Swarm 的,但是其中缺少一行:
-A DOCKER-USER -j RETURN
示例 iptables 规则:
*filter
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:FILTERS - [0:0]
:DOCKER-USER - [0:0]
-F INPUT
-F DOCKER-USER
-F FILTERS
-A INPUT -i lo -j ACCEPT
-A INPUT -j FILTERS
-A DOCKER-USER -i eno1 -j FILTERS
-A DOCKER-USER -j RETURN
-A FILTERS -m state --state ESTABLISHED,RELATED -j ACCEPT
#allow 80 and 443
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPT
###### System
# Allow SSH connections
-A FILTERS -p tcp --dport 22 -j ACCEPT
# Docker SWARM cluster connections
-A FILTERS -p tcp --dport 2377 -j ACCEPT
-A FILTERS -p tcp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 4789 -j ACCEPT
###### Rules home
# ...
###### end
-A FILTERS -j REJECT --reject-with icmp-host-prohibited
COMMIT
我已经初始化了一个有 1 名经理和 1 名工作人员的群,每个人都在不同的主机上,遵循 official tutorial. I also use Traefik, following these instructions on dockerswarm.rocks,使用简单的覆盖网络创建:
docker network create --driver=overlay traefik-public
现在部署我的一个服务,需要上网
虽然在 管理器节点 上部署服务时效果很好,但在 工作节点 上失败了。
docker-compose.yml
version: '3.5'
services:
export-phyc:
image: my.docker.registry/my/image
networks:
- traefik-public
deploy:
labels:
- traefik.enable=true
- traefik.docker.network=traefik-public
- traefik.constraint-label=traefik-public
- traefik.http.routers.myservice-http.rule=Host(`my.domain`)
- traefik.http.routers.myservice-http.entrypoints=http
- traefik.http.routers.myservice-http.middlewares=https-redirect
- traefik.http.routers.myservice-https.rule=Host(`my.domain`)
- traefik.http.routers.myservice-https.entrypoints=https
- traefik.http.routers.myservice-https.tls=true
- traefik.http.routers.myservice-https.tls.certresolver=le
- traefik.http.services.myservice.loadbalancer.server.port=80
networks:
traefik-public:
external: true
两台主机具有相同的 DNS 配置:
# cat /etc/resolv.conf
domain openstacklocal
search openstacklocal
nameserver 213.186.xx.xx
两个任务也有相同的 DNS 配置(但与主机不同):
# docker container exec <my-container-id> cat /etc/resolv.conf
search openstacklocal
nameserver 127.0.0.xx
options ndots:0
然而,经理上的任务可以访问互联网:
# docker container exec <my-container-id> wget google.com
Connecting to google.com (216.58.215.46:80)
Connecting to www.google.com (216.58.206.228:80)
saving to 'index.html'
index.html 100% |********************************| 13848 0:00:00 ETA
'index.html' saved
工作人员上的任务不能:
# docker container exec <my-container-id> wget google.com
wget: bad address 'google.com'
# docker container exec <my-container-id> wget 216.58.204.142
Connecting to 216.58.204.142 (216.58.204.142:80)
wget: can't connect to remote host (216.58.204.142): Operation timed out
我最糊涂了。如何让工作节点上的任务访问互联网?
所以问题出在我的防火墙 (iptables) 乱用 Docker 设置的规则。我确实需要实施自己的规则(在重新启动时启动),并且 Docker 必须设置其内部通信规则(每次 docker 守护程序重新启动时设置)。
我不是 iptables 的行家,我只是得到了一个应该可以很好地处理 Docker Swarm 的,但是其中缺少一行:
-A DOCKER-USER -j RETURN
示例 iptables 规则:
*filter
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:FILTERS - [0:0]
:DOCKER-USER - [0:0]
-F INPUT
-F DOCKER-USER
-F FILTERS
-A INPUT -i lo -j ACCEPT
-A INPUT -j FILTERS
-A DOCKER-USER -i eno1 -j FILTERS
-A DOCKER-USER -j RETURN
-A FILTERS -m state --state ESTABLISHED,RELATED -j ACCEPT
#allow 80 and 443
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPT
###### System
# Allow SSH connections
-A FILTERS -p tcp --dport 22 -j ACCEPT
# Docker SWARM cluster connections
-A FILTERS -p tcp --dport 2377 -j ACCEPT
-A FILTERS -p tcp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 4789 -j ACCEPT
###### Rules home
# ...
###### end
-A FILTERS -j REJECT --reject-with icmp-host-prohibited
COMMIT