Docker 网络:无法在多主机网络上按名称寻址容器

Docker network: Fail to adress containers by name on a multihost network

我有 3 台主机 运行 有一个 swarm 集群,上面有一个覆盖网络 docker 1.11

机器使用 boot2docker 和 virtualbox 启动:

docker-machine create -d virtualbox af-consul
docker run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap

docker-machine create -d virtualbox --virtualbox-memory 4096 \
--swarm \
--swarm-discovery="consul://$(docker-machine ip af-consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip af-consul):8500" \
--engine-opt="cluster-advertise=eth0:2376" af-repo

docker-machine create -d virtualbox --virtualbox-memory 4096 --swarm --swarm-master \
--swarm-discovery="consul://$(docker-machine ip af-consul):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip af-consul):8500" \
--engine-opt="cluster-advertise=eth0:2376" af-jenkins

网络是这样创建的:

eval $(docker-machine env --swarm af-jenkins)
docker network create -d overlay appfactory_overlay

然后使用 compose 完成容器的部署:

version: '2'
jenkins:
    user: root
    restart: always
    image: af-repo:5000/jenkins:latest
    # build: ./jenkins
    container_name: jenkins
    ports:
        - "8080:8080"
        - "50000:50000"
    networks:
        - appfactory_overlay
    volumes_from:
        - container:jenkins_data:rw
    environment:
      - "constraint:node==af-jenkins"
    external_links:
        - jenkins_data

    docker_registry:
        user: root
        container_name: docker_registry
        restart: always
        build: ./
        ports:
          - 5000:5000
        networks:
            - appfactory_overlay
        environment:
          REGISTRY_HTTP_TLS_CERTIFICATE: /etc/ssl/certs/domain.crt
          REGISTRY_HTTP_TLS_KEY: /etc/ssl/certs/domain.key
          constraint: "node==af-repo"

    reports:
        user: root
        restart: always
        image: httpd:2.4
        container_name: reports
        ports:
        - "2222:22"
        - "80:80"
        volumes_from:
            - container:repository_data:rw
        networks:
            - appfactory_overlay
        environment:
          - "constraint:node==af-repo"

networks:
  appfactory_overlay:
    external:
      name: appfactory_overlay

当我 ssh 到 af-repo 并在报告容器上执行 /bin/bash 时,我可以 ping docker_registry 容器,它位于同一主机上:

root@74068f00ffdb:/usr/local/apache2# ping docker_registry
PING docker_registry (10.0.0.2): 56 data bytes
64 bytes from 10.0.0.2: icmp_seq=6 ttl=64 time=0.157 ms
64 bytes from 10.0.0.2: icmp_seq=7 ttl=64 time=0.126 ms
64 bytes from 10.0.0.2: icmp_seq=8 ttl=64 time=0.127 ms
64 bytes from 10.0.0.2: icmp_seq=9 ttl=64 time=0.132 ms

但是当我尝试 ping 另一台主机上的 jenkins 容器时:

root@74068f00ffdb:/usr/local/apache2# ping jenkins
PING jenkins (10.0.0.5): 56 data bytes
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable
92 bytes from 74068f00ffdb (10.0.0.4): Destination Host Unreachable

我检查时网络似乎是正确的:

docker@af-repo:~$ docker network inspect appfactory_overlay
[
    {
        "Name": "appfactory_overlay",
        "Id": "2cedd8b85bd302649123c3313e6e2beb5ddf3dc8265851fda1318f4b048dd795",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1/24"
                }
            ]
        },
        "Internal": false,
        "Containers": {
            "74068f00ffdb19135b605eb84ff6a23887a44dd03a3f9645a130fb03ad074312": {
                "Name": "reports",
                "EndpointID": "78503be0001b75b60e16f788009c1d5bb573011582ca828fbc3acd8ab3c67459",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            },
            "ep-eb61b033ae6e073c2634d36a8511434bbe26615e9c18df1c7545515b4f3a8c05": {
                "Name": "web",
                "EndpointID": "eb61b033ae6e073c2634d36a8511434bbe26615e9c18df1c7545515b4f3a8c05",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            },
            "ep-ee9feed770ef9f7684981e848a41de6bd67632f716672f8ed279dddc15afcc08": {
                "Name": "jenkins",
                "EndpointID": "ee9feed770ef9f7684981e848a41de6bd67632f716672f8ed279dddc15afcc08",
                "MacAddress": "02:42:0a:00:00:05",
                "IPv4Address": "10.0.0.5/24",
                "IPv6Address": ""
            },
            "fc125785613bf07202d2a0f39371394890bd04f7e5c4e34353ecc55e1e5d6b0b": {
                "Name": "docker_registry",
                "EndpointID": "6cf8d868c299d5b442abd5f370e1392a34585dede221cff19b95edb29baefaaa",
                "MacAddress": "02:42:0a:00:00:02",
                "IPv4Address": "10.0.0.2/24",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

从 af-jenkins 执行时输出相同。 因此,据我所知,我的网络根本无法正常工作,因为我从同一主机上的另一个容器 ping 一个容器的能力来自为这些容器自动创建的桥接网络。

非常感谢任何帮助:)

编辑:这是集群信息输出

        { AppFactory } master » docker info                                                                                                                                                                                   /d/Alstom/AppFactory
    Containers: 12
     Running: 8
     Paused: 0
     Stopped: 4
    Images: 27
    Server Version: swarm/1.2.4
    Role: primary
    Strategy: spread
    Filters: health, port, containerslots, dependency, affinity, constraint
    Nodes: 3
     af-jenkins: 192.168.99.102:2376
      └ ID: BAG4:XIOG:JXLS:7UAO:JHFT:NOAY:OW35:L2RK:QV37:OQ2H:G5QN:KR4D
      └ Status: Healthy
      └ Containers: 4 (3 Running, 0 Paused, 1 Stopped)
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 4.051 GiB
      └ Labels: kernelversion=4.4.16-boot2docker, operatingsystem=Boot2Docker 1.12.0 (TCL 7.2); HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016, provider=virtualbox, storagedriver=aufs
      └ UpdatedAt: 2016-07-30T10:15:40Z
      └ ServerVersion: 1.12.0
     af-repo: 192.168.99.101:2376
      └ ID: 5U2Z:46IU:CRAG:V3S7:Q5WF:TESM:KBRN:H7L3:7LZD:H4DX:DTEC:WXNE
      └ Status: Healthy
      └ Containers: 6 (3 Running, 0 Paused, 3 Stopped)
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 4.051 GiB
      └ Labels: kernelversion=4.4.16-boot2docker, operatingsystem=Boot2Docker 1.12.0 (TCL 7.2); HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016, provider=virtualbox, storagedriver=aufs
      └ UpdatedAt: 2016-07-30T10:15:45Z
      └ ServerVersion: 1.12.0
     af-web: 192.168.99.103:2376
      └ ID: HJ7O:62JN:6RVT:4UKL:AAJ3:6VWU:2LGF:EIYC:CJAH:F4RJ:GNCF:J3TR
      └ Status: Healthy
      └ Containers: 2 (2 Running, 0 Paused, 0 Stopped)
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 517.3 MiB
      └ Labels: kernelversion=4.4.16-boot2docker, operatingsystem=Boot2Docker 1.12.0 (TCL 7.2); HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016, provider=virtualbox, storagedriver=aufs
      └ UpdatedAt: 2016-07-30T10:15:18Z
      └ ServerVersion: 1.12.0
    Plugins:
     Volume:
     Network:
    Swarm:
     NodeID:
     Is Manager: false
     Node Address:
    Security Options:
    Kernel Version: 4.4.16-boot2docker
    Operating System: linux
    Architecture: amd64
    CPUs: 3
    Total Memory: 8.608 GiB
    Name: 8759d85be147
    Docker Root Dir:
    Debug Mode (client): false
    Debug Mode (server): false
    WARNING: No kernel memory limit support

我认为,您的问题是您没有将 docker CLI 指向集群。在创建覆盖网络之前,您必须使用标志 --swarm.

连接到 swarm 集群
eval $(docker-machine env --swarm af-jenkins)

如果这不是问题,请告诉我们您集群中 docker info 命令的输出是什么。

注意: 使用docker 1.12.0可以简化创建集群的过程。

When using docker-machine, Virtualbox's "host only" interface is eth1, try changing your cluster-advertise settings to use eth1 instead. – slugonamission

谢谢。