为什么我的自定义 Dockerfile 不能通过 docker-compose 网络连接，而其他服务可以？

Question

问题

我正在尝试创建一个 docker-compose 文件来托管三个服务。 InfluxDB、Grafana 和填充数据库的客户 Docker 文件中的自定义脚本。我遇到了网络问题，由于 连接被拒绝错误（如下所示），自定义脚本无法连接到 InfluxDB。

目前的工作

有趣的是，当我从我的 docker-compose 文件中删除自定义脚本服务（称为 ads_agent）并从本地主机甚至运行中删除该脚本时在自己的容器中构建和 运行 Docker 文件，它连接得很好。

两者有什么区别

我的脚本读取一个名为 KTS_TELEMETRY_INFLUXDB_URL 的环境变量，用于连接 InfluxDB 客户端的 URL。我可以使用“http://localhost:8086”作为 URL 当我从我的命令行运行时，这是可行的。当我将脚本包装在 Docker 容器中时，我使用本地计算机的 LAN IP 地址，因为对它来说，localhost 只是容器。但是，尽管如此，这工作得很好。

在我的 docker-compose 中，因为所有三个服务都在同一个网络上，所以我使用“http://influxdb:8086”，因为该主机名应该绑定到该服务的网络界面。确实如此，因为 Grafana 使用 URL 连接得很好。遗憾的是，当我尝试使用脚本执行此操作时，连接被拒绝。

错误

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused

我的代码

这是我的docker-compose.yaml

version: "3"
services:
  influxdb:
    container_name: influxdb
    image: influxdb:2.0.9-alpine # influxdb:latest
    networks:
      - telemetry_network
    ports:
      - 8086:8086
    volumes:
      - influxdb-storage:/var/lib/influxdb2
    restart: always
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
      - DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
      - DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
      - DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
  grafana:
    container_name: grafana
    image: grafana/grafana:8.1.7 # grafana/grafana:latest
    networks:
      - telemetry_network
    ports:
      - 3000:3000
    volumes:
      - grafana-storage:/var/lib/grafana
    restart: always
    depends_on:
      - influxdb
  ads_agent:
    container_name: ads_agent
    build: ./ads_agent
    networks:
      - telemetry_network
    restart: always
    depends_on:
      - influxdb
    environment:
      - KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
      - KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
      - KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET

networks:
  telemetry_network:

volumes:
  influxdb-storage:
  grafana-storage:

这是我的ads_agent/Dockerfile

FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py

ads_agent/requirements.txt 只有 influxdb-client，这是我的 ads/main.py

import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time

token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]

client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)

while True:
    symbol_name = 'rand_num'
    value = random.random()
    timestamp = datetime.utcnow()
    print(timestamp, symbol_name, value)
    point = Point("mem") \
        .field(symbol_name, value) \
        .time(timestamp, WritePrecision.NS)
    dbh.write(bucket, org, point)
    time.sleep(1)

Answer 1

您的问题与 network connectivity 无关，仅与 startup order 有关。虽然你定义depends_on - influxdb为ads_agent，还是会有机会当你的脚本尝试连接 influxdb 时，influx db 仍然没有完成。

这就是为什么你手动操作可以成功的原因，因为你的手动操作有时间延迟，那时数据库已经准备好了。

原因见this：

depends_on does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready. )

为了确保你的数据库在你的脚本开始之前真的启动了，你需要参考Control startup and shutdown order in Compose:

To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.

The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:
Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections. For example, to use wait-for-it.sh or wait-for to wrap your service’s command:
version: "2"
services:
  web:
    build: .
    ports:
      - "80:8000"
    depends_on:
      - "db"
    command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
  db:
    image: postgres
Alternatively, write your own wrapper script to perform a more application-specific health check.

为什么我的自定义 Dockerfile 不能通过 docker-compose 网络连接，而其他服务可以？

Why won't my custom Dockerfile connect over the docker-compose network when other services will?

docker

influxdb

docker-compose

docker-networking

influxdb-python

问题

目前的工作

两者有什么区别

错误

我的代码