Docker swarm worker 节点无法为其托管的 nginx 服务提供服务
Docker swarm worker node cannot serve the nginx service it is hosting
作为一项学习练习,我尝试在两个测试 AWS EC2 实例上设置一个 docker 群,但是当我尝试访问来自工作节点 IP 地址的服务。
在主服务器上,我运行 docker swarm init
。然后我拿了输出令牌和 运行 docker swarm join --token <token> <Master Private IP>:2377
然后我在master上做了一个简单的docker service create -p 80:80 --name nginx nginx
,然后是docker service scale nginx=2
。现在,检查 docker service ps nginx
给出以下内容:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
idux5dftj9oj nginx.1 nginx:latest ip-172-31-13-2 Running Running 12 minutes ago
2nwfw3fncybj nginx.2 nginx:latest ip-172-31-14-130 Running Running 38 seconds ago
我已经在安全组 according to this guide 上打开了入站端口,特别是:
- TCP 端口 2377
- TCP 和 UDP 端口 7946
- UDP 端口 4789
master和worker的安全组是一样的,所以我就把source设置成自己了。
当我 运行 curl http://localhost
在 master 上时,它给了我这个,证明它有效:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
<!-- Omitting this for brevity -->
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<!-- Omitting this for brevity -->
</body>
但是在 worker 上,我只得到 curl: (7) Failed to connect to localhost port 80: Connection refused
工人的 docker ps
给我:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b37770b153db nginx:latest "nginx -g 'daemon of…" 34 minutes ago Up 34 minutes 80/tcp nginx.2.2nwfw3fncybjj7qzeierlx0xr
运行 docker service inspect nginx
上师给出:
[
{
"ID": "887xm47oavn367w0o4bo1nmce",
"Version": {
"Index": 652
},
"CreatedAt": "2019-05-19T07:50:54.491113206Z",
"UpdatedAt": "2019-05-19T08:02:53.454804111Z",
"Spec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest@sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"StopGracePeriod": 10000000000,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 2
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"PreviousSpec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest@sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"Endpoint": {
"Spec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
},
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
],
"VirtualIPs": [
{
"NetworkID": "6scdvoeno2tviu4zgyldmq6b4",
"Addr": "10.255.0.82/16"
}
]
}
}
]
这里是大师的docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: q4h5ahgxf1xwuyi2aotyt20iy
Is Manager: true
ClusterID: r88oqh59x74bl1kqrcg5od2qd
Managers: 1
Nodes: 2
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.13.2
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-13-2
ID: RM34:I2IM:EJ2V:W74X:ECSD:ABCC:ZB4T:B7UO:OIWW:SUQ2:ILDB:HQLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
这是工人的 docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: slya32xwjmklumhm23bt7xs6m
Is Manager: false
Node Address: 172.31.14.130
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-14-130
ID: X7FI:3VCW:OCVI:5XSX:HJ24:2NOD:NQYU:SEYL:JVIJ:J4DI:F5UL:NKZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: bizmd
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
据我所知,将 worker 添加到 swarm 并创建服务后应该没有任何问题。尽管如此,工作人员无法访问它已经托管的 nginx 服务。
是什么导致了这个问题?
我想检查我的工作服务器中实际打开了哪些端口(而不是仅在防火墙上打开了哪些端口)。
netstat -tulpn
告诉我:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::9443 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
udp 19968 0 127.0.0.53:53 0.0.0.0:* -
udp 0 0 172.31.14.130:68 0.0.0.0:* -
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
我注意到没有进程在使用 7946,这是需要打开的端口之一。所以我重新启动了docker服务:sudo service docker restart
重启完成后,我看到一个进程启动并占用了端口。果然,我然后能够对任一节点执行 curl localhost
。
作为一项学习练习,我尝试在两个测试 AWS EC2 实例上设置一个 docker 群,但是当我尝试访问来自工作节点 IP 地址的服务。
在主服务器上,我运行 docker swarm init
。然后我拿了输出令牌和 运行 docker swarm join --token <token> <Master Private IP>:2377
然后我在master上做了一个简单的docker service create -p 80:80 --name nginx nginx
,然后是docker service scale nginx=2
。现在,检查 docker service ps nginx
给出以下内容:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
idux5dftj9oj nginx.1 nginx:latest ip-172-31-13-2 Running Running 12 minutes ago
2nwfw3fncybj nginx.2 nginx:latest ip-172-31-14-130 Running Running 38 seconds ago
我已经在安全组 according to this guide 上打开了入站端口,特别是:
- TCP 端口 2377
- TCP 和 UDP 端口 7946
- UDP 端口 4789
master和worker的安全组是一样的,所以我就把source设置成自己了。
当我 运行 curl http://localhost
在 master 上时,它给了我这个,证明它有效:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
<!-- Omitting this for brevity -->
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<!-- Omitting this for brevity -->
</body>
但是在 worker 上,我只得到 curl: (7) Failed to connect to localhost port 80: Connection refused
工人的 docker ps
给我:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b37770b153db nginx:latest "nginx -g 'daemon of…" 34 minutes ago Up 34 minutes 80/tcp nginx.2.2nwfw3fncybjj7qzeierlx0xr
运行 docker service inspect nginx
上师给出:
[
{
"ID": "887xm47oavn367w0o4bo1nmce",
"Version": {
"Index": 652
},
"CreatedAt": "2019-05-19T07:50:54.491113206Z",
"UpdatedAt": "2019-05-19T08:02:53.454804111Z",
"Spec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest@sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"StopGracePeriod": 10000000000,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 2
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"PreviousSpec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest@sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"Endpoint": {
"Spec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
},
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
],
"VirtualIPs": [
{
"NetworkID": "6scdvoeno2tviu4zgyldmq6b4",
"Addr": "10.255.0.82/16"
}
]
}
}
]
这里是大师的docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: q4h5ahgxf1xwuyi2aotyt20iy
Is Manager: true
ClusterID: r88oqh59x74bl1kqrcg5od2qd
Managers: 1
Nodes: 2
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.13.2
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-13-2
ID: RM34:I2IM:EJ2V:W74X:ECSD:ABCC:ZB4T:B7UO:OIWW:SUQ2:ILDB:HQLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
这是工人的 docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: slya32xwjmklumhm23bt7xs6m
Is Manager: false
Node Address: 172.31.14.130
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-14-130
ID: X7FI:3VCW:OCVI:5XSX:HJ24:2NOD:NQYU:SEYL:JVIJ:J4DI:F5UL:NKZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: bizmd
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
据我所知,将 worker 添加到 swarm 并创建服务后应该没有任何问题。尽管如此,工作人员无法访问它已经托管的 nginx 服务。
是什么导致了这个问题?
我想检查我的工作服务器中实际打开了哪些端口(而不是仅在防火墙上打开了哪些端口)。
netstat -tulpn
告诉我:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::9443 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
udp 19968 0 127.0.0.53:53 0.0.0.0:* -
udp 0 0 172.31.14.130:68 0.0.0.0:* -
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
我注意到没有进程在使用 7946,这是需要打开的端口之一。所以我重新启动了docker服务:sudo service docker restart
重启完成后,我看到一个进程启动并占用了端口。果然,我然后能够对任一节点执行 curl localhost
。