为什么我的自定义 Dockerfile 不能通过 docker-compose 网络连接,而其他服务可以?

Why won't my custom Dockerfile connect over the docker-compose network when other services will?

问题

我正在尝试创建一个 docker-compose 文件来托管三个服务。 InfluxDB、Grafana 和填充数据库的客户 Docker 文件中的自定义脚本。我遇到了网络问题,由于 连接被拒绝错误(如下所示),自定义脚本无法连接到 InfluxDB。

目前的工作

有趣的是,当我从我的 docker-compose 文件中删除自定义脚本服务(称为 ads_agent)并从本地主机甚至 运行 中删除该脚本时在自己的容器中构建和 运行 Docker 文件,它连接得很好

两者有什么区别

我的脚本读取一个名为 KTS_TELEMETRY_INFLUXDB_URL 的环境变量,用于连接 InfluxDB 客户端的 URL。我可以使用“http://localhost:8086”作为 URL 当我从我的命令行 运行 时,这是可行的。当我将脚本包装在 Docker 容器中时,我使用本地计算机的 LAN IP 地址,因为对它来说,localhost 只是容器。但是,尽管如此,这工作得很好。

在我的 docker-compose 中,因为所有三个服务都在同一个网络上,所以我使用“http://influxdb:8086”,因为该主机名应该绑定到该服务的网络界面。确实如此,因为 Grafana 使用 URL 连接得很好。遗憾的是,当我尝试使用脚本执行此操作时,连接被拒绝。

错误

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused

我的代码

这是我的docker-compose.yaml

version: "3"
services:
  influxdb:
    container_name: influxdb
    image: influxdb:2.0.9-alpine # influxdb:latest
    networks:
      - telemetry_network
    ports:
      - 8086:8086
    volumes:
      - influxdb-storage:/var/lib/influxdb2
    restart: always
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
      - DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
      - DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
      - DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
  grafana:
    container_name: grafana
    image: grafana/grafana:8.1.7 # grafana/grafana:latest
    networks:
      - telemetry_network
    ports:
      - 3000:3000
    volumes:
      - grafana-storage:/var/lib/grafana
    restart: always
    depends_on:
      - influxdb
  ads_agent:
    container_name: ads_agent
    build: ./ads_agent
    networks:
      - telemetry_network
    restart: always
    depends_on:
      - influxdb
    environment:
      - KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
      - KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
      - KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET

networks:
  telemetry_network:

volumes:
  influxdb-storage:
  grafana-storage:

这是我的ads_agent/Dockerfile

FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py

ads_agent/requirements.txt 只有 influxdb-client,这是我的 ads/main.py

import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time

token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]

client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)

while True:
    symbol_name = 'rand_num'
    value = random.random()
    timestamp = datetime.utcnow()
    print(timestamp, symbol_name, value)
    point = Point("mem") \
        .field(symbol_name, value) \
        .time(timestamp, WritePrecision.NS)
    dbh.write(bucket, org, point)
    time.sleep(1)

您的问题与 network connectivity 无关,仅与 startup order 有关。虽然你定义depends_on - influxdbads_agent,还是会有机会 当你的脚本尝试连接 influxdb 时,influx db 仍然没有完成。

这就是为什么你手动操作可以成功的原因,因为你的手动操作有时间延迟,那时数据库已经准备好了。

原因见this

depends_on does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready. )

为了确保你的数据库在你的脚本开始之前真的启动了,你需要参考Control startup and shutdown order in Compose:

To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.

The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:

  • Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections. For example, to use wait-for-it.sh or wait-for to wrap your service’s command:

    version: "2"
    services:
      web:
        build: .
        ports:
          - "80:8000"
        depends_on:
          - "db"
        command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
      db:
        image: postgres
    
  • Alternatively, write your own wrapper script to perform a more application-specific health check.