Docker swarm TLS 无法验证挂起的节点
Docker swarm TLS Failed to validate pending node
我的集群管理容器上有这个日志:
time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376"
我已经设置了一个 github 存储库来重现我的问题:https://github.com/casertap/playing-with-swarm-tls
我正在 运行 集群 ok 2 机器(用 vagrant 构建)
$script2 = <<STOP
service docker stop
sed -i 's/DOCKER_OPTS=/DOCKER_OPTS="-H tcp:\/\/0.0.0.0:2376 -H unix:\/\/\/var\/run\/docker.sock --tlsverify --tlscacert=\/home\/vagrant\/.certs\/ca.pem --tlscert=\/home\/vagrant\/.certs\/cert.pem --tlskey=\/home\/vagrant\/.certs\/key.pem"/' /etc/init/docker.conf
service docker start
STOP
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
config.vm.define "node1" do |app|
app.vm.network "private_network", ip: "192.168.33.10"
app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem"
app.vm.provision "file", source: "node1-cert.pem", destination: "~/.certs/cert.pem"
app.vm.provision "file", source: "node1-priv-key.pem", destination: "~/.certs/key.pem"
app.vm.provision "file", source: "node1.csr", destination: "~/.certs/node1.csr"
app.vm.provision "docker"
app.vm.provision :shell, :inline => $script2
end
config.vm.define "swarm" do |app|
app.vm.network "private_network", ip: "192.168.33.12"
app.vm.provision "shell", inline: "echo '192.168.33.10 node1' >> /etc/hosts"
app.vm.provision "shell", inline: "echo '192.168.33.12 swarm' >> /etc/hosts"
app.vm.provision "docker"
app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem"
app.vm.provision "file", source: "swarm-cert.pem", destination: "~/.certs/cert.pem"
app.vm.provision "file", source: "swarm-priv-key.pem", destination: "~/.certs/key.pem"
app.vm.provision "file", source: "swarm.csr", destination: "~/.certs/swarm.csr"
end
end
如你所见,我的 node1 /etc/init/docker.conf 有以下选项:
DOCKER_OPTS="-H tcp:\/\/0.0.0.0:2376 -H unix:\/\/\/var\/run\/docker.sock --tlsverify --tlscacert=\/home\/vagrant\/.certs\/ca.pem --tlscert=\/home\/vagrant\/.certs\/cert.pem --tlskey=\/home\/vagrant\/.certs\/key.pem"
我愿意
流浪起来
然后我连接到 swarm
vagrant ssh swarm
export TOKEN=$(docker run swarm create)
#dd182b8d2bc8c03f417376296558ba29
docker run -d swarm join --advertise node1:2376 token://dd182b8d2bc8c03f417376296558ba29
node1 在 /etc/hosts 文件中定义,如您在 vagrant provision 文件中所见。
以日志调试级别启动 swarm 管理器(没有 -d)
docker run -p 3376:3376 -v /home/vagrant/.certs:/certs:ro swarm -l debug manage --tlsverify --tlscacert=/certs/ca.pem --tlscert=/certs/cert.pem --tlskey=/certs/key.pem --host=0.0.0.0:3376 token://dd182b8d2bc8c03f417376296558ba29
日志显示我:
time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376"
我在/etc/hosts中的node1 ip地址实际上是:
192.168.33.10 node1
似乎 docker 正在尝试在错误的桥接网络上查找 node1 别名?
========== 更多信息:
您可以检查此 url 以查看发现服务是否找到了您的 node1 并且它找到了:
https://discovery.hub.docker.com/v1/clusters/dd182b8d2bc8c03f417376296558ba29
现在,如果你 运行 使用 -d 的 swarm 管理器并执行:
vagrant@vagrant-ubuntu-trusty-64:~$ docker --tlsverify --tlscacert=/home/vagrant/.certs/ca.pem --tlscert=/home/vagrant/.certs/cert.pem --tlskey=/home/vagrant/.certs/key.pem -H swarm:3376 info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: swarm/1.2.0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 1
(unknown): node1:2376
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ Error: (none)
└ UpdatedAt: 2016-04-15T03:03:28Z
└ ServerVersion:
Plugins:
Volume:
Network:
Kernel Version: 3.13.0-85-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: ee85273cbb64
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support
您看到节点处于:待处理
尽管您在机器的 /etc/hosts 中定义了 node1,但 swarm 管理器 运行 的容器在其 /etc/hosts 文件中没有 node1。默认情况下,容器不共享主机的文件系统。参见 https://docs.docker.com/engine/userguide/containers/dockervolumes/。 Swarm 管理器尝试通过 DNS 解析器查找 node1 但失败了。
有几种方法可以解决这个问题。
- 使用可解析的 FQDN,以便容器中的 Swarm 管理器可以解析节点
- 或者在 swarm join 命令中提供 node1 的 IP
- 或者使用
-v
选项将 /etc/hosts 文件从主机传递到 Swarm 管理器容器。请参阅上面的 link。
我的集群管理容器上有这个日志:
time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376"
我已经设置了一个 github 存储库来重现我的问题:https://github.com/casertap/playing-with-swarm-tls 我正在 运行 集群 ok 2 机器(用 vagrant 构建)
$script2 = <<STOP
service docker stop
sed -i 's/DOCKER_OPTS=/DOCKER_OPTS="-H tcp:\/\/0.0.0.0:2376 -H unix:\/\/\/var\/run\/docker.sock --tlsverify --tlscacert=\/home\/vagrant\/.certs\/ca.pem --tlscert=\/home\/vagrant\/.certs\/cert.pem --tlskey=\/home\/vagrant\/.certs\/key.pem"/' /etc/init/docker.conf
service docker start
STOP
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
config.vm.define "node1" do |app|
app.vm.network "private_network", ip: "192.168.33.10"
app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem"
app.vm.provision "file", source: "node1-cert.pem", destination: "~/.certs/cert.pem"
app.vm.provision "file", source: "node1-priv-key.pem", destination: "~/.certs/key.pem"
app.vm.provision "file", source: "node1.csr", destination: "~/.certs/node1.csr"
app.vm.provision "docker"
app.vm.provision :shell, :inline => $script2
end
config.vm.define "swarm" do |app|
app.vm.network "private_network", ip: "192.168.33.12"
app.vm.provision "shell", inline: "echo '192.168.33.10 node1' >> /etc/hosts"
app.vm.provision "shell", inline: "echo '192.168.33.12 swarm' >> /etc/hosts"
app.vm.provision "docker"
app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem"
app.vm.provision "file", source: "swarm-cert.pem", destination: "~/.certs/cert.pem"
app.vm.provision "file", source: "swarm-priv-key.pem", destination: "~/.certs/key.pem"
app.vm.provision "file", source: "swarm.csr", destination: "~/.certs/swarm.csr"
end
end
如你所见,我的 node1 /etc/init/docker.conf 有以下选项:
DOCKER_OPTS="-H tcp:\/\/0.0.0.0:2376 -H unix:\/\/\/var\/run\/docker.sock --tlsverify --tlscacert=\/home\/vagrant\/.certs\/ca.pem --tlscert=\/home\/vagrant\/.certs\/cert.pem --tlskey=\/home\/vagrant\/.certs\/key.pem"
我愿意
流浪起来
然后我连接到 swarm
vagrant ssh swarm
export TOKEN=$(docker run swarm create)
#dd182b8d2bc8c03f417376296558ba29
docker run -d swarm join --advertise node1:2376 token://dd182b8d2bc8c03f417376296558ba29
node1 在 /etc/hosts 文件中定义,如您在 vagrant provision 文件中所见。
以日志调试级别启动 swarm 管理器(没有 -d)
docker run -p 3376:3376 -v /home/vagrant/.certs:/certs:ro swarm -l debug manage --tlsverify --tlscacert=/certs/ca.pem --tlscert=/certs/cert.pem --tlskey=/certs/key.pem --host=0.0.0.0:3376 token://dd182b8d2bc8c03f417376296558ba29
日志显示我:
time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376"
我在/etc/hosts中的node1 ip地址实际上是:
192.168.33.10 node1
似乎 docker 正在尝试在错误的桥接网络上查找 node1 别名?
========== 更多信息:
您可以检查此 url 以查看发现服务是否找到了您的 node1 并且它找到了:
https://discovery.hub.docker.com/v1/clusters/dd182b8d2bc8c03f417376296558ba29
现在,如果你 运行 使用 -d 的 swarm 管理器并执行:
vagrant@vagrant-ubuntu-trusty-64:~$ docker --tlsverify --tlscacert=/home/vagrant/.certs/ca.pem --tlscert=/home/vagrant/.certs/cert.pem --tlskey=/home/vagrant/.certs/key.pem -H swarm:3376 info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: swarm/1.2.0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 1
(unknown): node1:2376
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ Error: (none)
└ UpdatedAt: 2016-04-15T03:03:28Z
└ ServerVersion:
Plugins:
Volume:
Network:
Kernel Version: 3.13.0-85-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: ee85273cbb64
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support
您看到节点处于:待处理
尽管您在机器的 /etc/hosts 中定义了 node1,但 swarm 管理器 运行 的容器在其 /etc/hosts 文件中没有 node1。默认情况下,容器不共享主机的文件系统。参见 https://docs.docker.com/engine/userguide/containers/dockervolumes/。 Swarm 管理器尝试通过 DNS 解析器查找 node1 但失败了。
有几种方法可以解决这个问题。
- 使用可解析的 FQDN,以便容器中的 Swarm 管理器可以解析节点
- 或者在 swarm join 命令中提供 node1 的 IP
- 或者使用
-v
选项将 /etc/hosts 文件从主机传递到 Swarm 管理器容器。请参阅上面的 link。