为什么我的自定义 Dockerfile 不能通过 docker-compose 网络连接,而其他服务可以?
Why won't my custom Dockerfile connect over the docker-compose network when other services will?
问题
我正在尝试创建一个 docker-compose 文件来托管三个服务。 InfluxDB、Grafana 和填充数据库的客户 Docker 文件中的自定义脚本。我遇到了网络问题,由于 连接被拒绝错误(如下所示),自定义脚本无法连接到 InfluxDB。
目前的工作
有趣的是,当我从我的 docker-compose 文件中删除自定义脚本服务(称为 ads_agent)并从本地主机甚至 运行 中删除该脚本时在自己的容器中构建和 运行 Docker 文件,它连接得很好。
两者有什么区别
我的脚本读取一个名为 KTS_TELEMETRY_INFLUXDB_URL 的环境变量,用于连接 InfluxDB 客户端的 URL。我可以使用“http://localhost:8086”作为 URL 当我从我的命令行 运行 时,这是可行的。当我将脚本包装在 Docker 容器中时,我使用本地计算机的 LAN IP 地址,因为对它来说,localhost 只是容器。但是,尽管如此,这工作得很好。
在我的 docker-compose 中,因为所有三个服务都在同一个网络上,所以我使用“http://influxdb:8086”,因为该主机名应该绑定到该服务的网络界面。确实如此,因为 Grafana 使用 URL 连接得很好。遗憾的是,当我尝试使用脚本执行此操作时,连接被拒绝。
错误
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused
我的代码
这是我的docker-compose.yaml
version: "3"
services:
influxdb:
container_name: influxdb
image: influxdb:2.0.9-alpine # influxdb:latest
networks:
- telemetry_network
ports:
- 8086:8086
volumes:
- influxdb-storage:/var/lib/influxdb2
restart: always
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
- DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
- DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
- DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
grafana:
container_name: grafana
image: grafana/grafana:8.1.7 # grafana/grafana:latest
networks:
- telemetry_network
ports:
- 3000:3000
volumes:
- grafana-storage:/var/lib/grafana
restart: always
depends_on:
- influxdb
ads_agent:
container_name: ads_agent
build: ./ads_agent
networks:
- telemetry_network
restart: always
depends_on:
- influxdb
environment:
- KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
- KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
- KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
networks:
telemetry_network:
volumes:
influxdb-storage:
grafana-storage:
这是我的ads_agent/Dockerfile
FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py
ads_agent/requirements.txt 只有 influxdb-client,这是我的 ads/main.py
import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time
token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]
client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)
while True:
symbol_name = 'rand_num'
value = random.random()
timestamp = datetime.utcnow()
print(timestamp, symbol_name, value)
point = Point("mem") \
.field(symbol_name, value) \
.time(timestamp, WritePrecision.NS)
dbh.write(bucket, org, point)
time.sleep(1)
您的问题与 network connectivity
无关,仅与 startup order
有关。虽然你定义depends_on - influxdb
为ads_agent
,还是会有机会
当你的脚本尝试连接 influxdb 时,influx db 仍然没有完成。
这就是为什么你手动操作可以成功的原因,因为你的手动操作有时间延迟,那时数据库已经准备好了。
原因见this:
depends_on
does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready.
)
为了确保你的数据库在你的脚本开始之前真的启动了,你需要参考Control startup and shutdown order in Compose:
To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.
The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:
Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections.
For example, to use wait-for-it.sh or wait-for to wrap your service’s command:
version: "2"
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
db:
image: postgres
Alternatively, write your own wrapper script to perform a more application-specific health check.
问题
我正在尝试创建一个 docker-compose 文件来托管三个服务。 InfluxDB、Grafana 和填充数据库的客户 Docker 文件中的自定义脚本。我遇到了网络问题,由于 连接被拒绝错误(如下所示),自定义脚本无法连接到 InfluxDB。
目前的工作
有趣的是,当我从我的 docker-compose 文件中删除自定义脚本服务(称为 ads_agent)并从本地主机甚至 运行 中删除该脚本时在自己的容器中构建和 运行 Docker 文件,它连接得很好。
两者有什么区别
我的脚本读取一个名为 KTS_TELEMETRY_INFLUXDB_URL 的环境变量,用于连接 InfluxDB 客户端的 URL。我可以使用“http://localhost:8086”作为 URL 当我从我的命令行 运行 时,这是可行的。当我将脚本包装在 Docker 容器中时,我使用本地计算机的 LAN IP 地址,因为对它来说,localhost 只是容器。但是,尽管如此,这工作得很好。
在我的 docker-compose 中,因为所有三个服务都在同一个网络上,所以我使用“http://influxdb:8086”,因为该主机名应该绑定到该服务的网络界面。确实如此,因为 Grafana 使用 URL 连接得很好。遗憾的是,当我尝试使用脚本执行此操作时,连接被拒绝。
错误
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused
我的代码
这是我的docker-compose.yaml
version: "3"
services:
influxdb:
container_name: influxdb
image: influxdb:2.0.9-alpine # influxdb:latest
networks:
- telemetry_network
ports:
- 8086:8086
volumes:
- influxdb-storage:/var/lib/influxdb2
restart: always
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
- DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
- DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
- DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
grafana:
container_name: grafana
image: grafana/grafana:8.1.7 # grafana/grafana:latest
networks:
- telemetry_network
ports:
- 3000:3000
volumes:
- grafana-storage:/var/lib/grafana
restart: always
depends_on:
- influxdb
ads_agent:
container_name: ads_agent
build: ./ads_agent
networks:
- telemetry_network
restart: always
depends_on:
- influxdb
environment:
- KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
- KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
- KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
networks:
telemetry_network:
volumes:
influxdb-storage:
grafana-storage:
这是我的ads_agent/Dockerfile
FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py
ads_agent/requirements.txt 只有 influxdb-client,这是我的 ads/main.py
import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time
token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]
client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)
while True:
symbol_name = 'rand_num'
value = random.random()
timestamp = datetime.utcnow()
print(timestamp, symbol_name, value)
point = Point("mem") \
.field(symbol_name, value) \
.time(timestamp, WritePrecision.NS)
dbh.write(bucket, org, point)
time.sleep(1)
您的问题与 network connectivity
无关,仅与 startup order
有关。虽然你定义depends_on - influxdb
为ads_agent
,还是会有机会
当你的脚本尝试连接 influxdb 时,influx db 仍然没有完成。
这就是为什么你手动操作可以成功的原因,因为你的手动操作有时间延迟,那时数据库已经准备好了。
原因见this:
depends_on
does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready. )
为了确保你的数据库在你的脚本开始之前真的启动了,你需要参考Control startup and shutdown order in Compose:
To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.
The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:
Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections. For example, to use wait-for-it.sh or wait-for to wrap your service’s command:
version: "2" services: web: build: . ports: - "80:8000" depends_on: - "db" command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"] db: image: postgres
Alternatively, write your own wrapper script to perform a more application-specific health check.