Python Docker 容器中的应用在应用失败时没有 stop/remove Docker 容器
Python app in Docker container doesn't stop/remove Docker container when app fails
我有一个 Python 应用程序,它轮询队列以获取新数据,并将其插入到 TimescaleDB 数据库中(TimescaleDB 是 PostgreSQL 的扩展)。 此应用程序必须始终处于运行状态。
问题是,Python程序可能会不时失败,我希望Docker Swarm重新启动容器。但是,容器即使在失败后也会保持 运行ning。 为什么我的容器没有出现故障然后被 Docker Swarm 重新启动?
Python 应用看起来像这样:
def main():
try:
conn = get_db_conn()
insert_data(conn)
except Exception:
logger.exception("Error with main inserter.py function")
send_email_if_error()
raise
finally:
try:
conn.close()
del conn
except Exception:
pass
return 0
if __name__ == "__main__":
main()
Docker文件如下所示:
FROM python:3.8-slim-buster
# Configure apt and install packages
RUN apt-get update && \
apt-get -y --no-install-recommends install cron nano procps
# Install Python requirements.
RUN pip3 install --upgrade pip && \
pip3 install poetry==1.0.10
COPY poetry.lock pyproject.toml /
RUN poetry config virtualenvs.create false && \
poetry install --no-interaction --no-ansi
# Copy everything to the / folder inside the container
COPY . /
# Make /var/log the default directory in the container
WORKDIR /var/log
# Start Python app on container startup
CMD ["python3", "/inserter/inserter.py"]
Docker-撰写文件:
version: '3.7'
services:
inserter13:
# Name and tag of image the Dockerfile creates
image: mccarthysean/ijack:timescale
depends_on:
- timescale13
env_file: .env
environment:
POSTGRES_HOST: timescale13
networks:
- traefik-public
deploy:
# Either global (exactly one container per physical node) or
# replicated (a specified number of containers). The default is replicated
mode: replicated
# For stateless applications using "replicated" mode,
# the total number of replicas to create
replicas: 2
restart_policy:
on-failure # default is 'any'
timescale13:
image: timescale/timescaledb:2.3.0-pg13
volumes:
- type: volume
source: ijack-timescale-db-pg13
target: /var/lib/postgresql/data # the location in the container where the data are stored
read_only: false
# Custom postgresql.conf file will be mounted (see command: as well)
- type: bind
source: ./postgresql_custom.conf
target: /postgresql_custom.conf
read_only: false
env_file: .env
command: ["-c", "config_file=/postgresql_custom.conf"]
ports:
- 0.0.0.0:5432:5432
networks:
traefik-public:
deploy:
# Either global (exactly one container per physical node) or
# replicated (a specified number of containers). The default is replicated
mode: replicated
# For stateless applications using "replicated" mode,
# the total number of replicas to create
replicas: 1
placement:
constraints:
# Since this is for the stateful database,
# only run it on the swarm manager, not on workers
- "node.role==manager"
restart_policy:
condition: on-failure # default is 'any'
# Use a named external volume to persist our data
volumes:
ijack-timescale-db-pg13:
external: true
networks:
# Use the previously created public network "traefik-public", shared with other
# services that need to be publicly available via this Traefik
traefik-public:
external: true
我用来构建“inserter.py”容器镜像的“Docker-compose.build.yml”文件:
version: '3.7'
services:
inserter:
# Name and tag of image the Dockerfile creates
image: mccarthysean/ijack:timescale
build:
# context: where should docker-compose look for the Dockerfile?
# i.e. either a path to a directory containing a Dockerfile, or a url to a git repository
context: .
dockerfile: Dockerfile.inserter
environment:
POSTGRES_HOST: timescale
Bash 脚本 I 运行,它使用 Docker Swarm 构建、推送和部署数据库和插入器容器:
#!/bin/bash
# Build and tag image locally in one step.
# No need for docker tag <image> mccarthysean/ijack:<tag>
echo ""
echo "Building the image locally..."
echo "docker-compose -f docker-compose.build.yml build"
docker-compose -f docker-compose.build.yml build
# Push to Docker Hub
# docker login --username=mccarthysean
echo ""
echo "Pushing the image to Docker Hub..."
echo "docker push mccarthysean/ijack:timescale"
docker push mccarthysean/ijack:timescale
# Deploy to the Docker swarm and send login credentials
# to other nodes in the swarm with "--with-registry-auth"
echo ""
echo "Deploying to the Docker swarm..."
echo "docker stack deploy --with-registry-auth -c docker-compose.prod13.yml timescale13"
docker stack deploy --with-registry-auth -c docker-compose.prod13.yml timescale13
当 Python 插入程序失败时(可能是数据库连接问题,或其他原因),它会向我发送一封电子邮件警报,然后引发错误并失败。在这一点上,我希望 Docker 容器失败并用 Docker Swarm 的 restart_policy: on-failure
重新启动。但是,在出错后,当我键入 docker service ls
时,我看到以下内容 0/2 replicas
:
ID NAME MODE REPLICAS IMAGE PORTS
u354h0uj4ug6 timescale13_inserter13 replicated 0/2 mccarthysean/ijack:timescale
o0rbfx5n2z4h timescale13_timescale13 replicated 1/1 timescale/timescaledb:2.3.0-pg13 *:5432->5432/tcp
当它健康时(大部分时间),它会显示 2/2
个副本。为什么我的容器没有失败然后被 Docker Swarm 重新启动?
我想通了,并更新了我的问题以提供有关我的 try: except:
失败例程的更多详细信息。
这是发生的错误(实际上是依次发生的两个错误,如您所见):
Here's the error information:
Traceback (most recent call last):
File "/inserter/inserter.py", line 357, in execute_sql
cursor.execute(sql, values)
psycopg2.errors.AdminShutdown: terminating connection due to administrator command SSL connection has been closed unexpectedly
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/inserter/inserter.py", line 911, in main
insert_alarm_log_rds(
File "/inserter/inserter.py", line 620, in insert_alarm_log_rds
rc = execute_sql(
File "/inserter/inserter.py", line 364, in execute_sql
conn.rollback()
psycopg2.InterfaceError: connection already closed
如您所见,首先出现了一个 psycopg2.errors.AdminShutdown
错误,这是我第一个 引发的 try: except:
常规。然而,这之后是 second psycopg2.InterfaceError
,它实际上发生在我的 finally:
清理代码中,然后是 pass
语句和一个 return 0
,所以我猜之前的错误没有重新出现,并且代码以错误代码 0 而不是 non- 结束零刺激重启。
@edijon 关于需要非零退出代码的评论帮助我解决了这个问题。
我需要重新提出 finally:
例程中的错误,如下所示:
def main():
try:
conn = get_db_conn()
insert_data(conn)
except Exception:
logger.exception("Error with main inserter.py function")
send_email_if_error()
raise
finally:
try:
conn.close()
del conn
except Exception:
# previously the following was just 'pass'
# and I changed it to 'raise' to ensure errors
# cause a non-zero error code for Docker's 'restart_policy'
raise
# The following was previously "return 0"
# which caused the container not to restart...
# Either comment it out, or change it to return non-zero
return 1
if __name__ == "__main__":
main()
我有一个 Python 应用程序,它轮询队列以获取新数据,并将其插入到 TimescaleDB 数据库中(TimescaleDB 是 PostgreSQL 的扩展)。 此应用程序必须始终处于运行状态。
问题是,Python程序可能会不时失败,我希望Docker Swarm重新启动容器。但是,容器即使在失败后也会保持 运行ning。 为什么我的容器没有出现故障然后被 Docker Swarm 重新启动?
Python 应用看起来像这样:
def main():
try:
conn = get_db_conn()
insert_data(conn)
except Exception:
logger.exception("Error with main inserter.py function")
send_email_if_error()
raise
finally:
try:
conn.close()
del conn
except Exception:
pass
return 0
if __name__ == "__main__":
main()
Docker文件如下所示:
FROM python:3.8-slim-buster
# Configure apt and install packages
RUN apt-get update && \
apt-get -y --no-install-recommends install cron nano procps
# Install Python requirements.
RUN pip3 install --upgrade pip && \
pip3 install poetry==1.0.10
COPY poetry.lock pyproject.toml /
RUN poetry config virtualenvs.create false && \
poetry install --no-interaction --no-ansi
# Copy everything to the / folder inside the container
COPY . /
# Make /var/log the default directory in the container
WORKDIR /var/log
# Start Python app on container startup
CMD ["python3", "/inserter/inserter.py"]
Docker-撰写文件:
version: '3.7'
services:
inserter13:
# Name and tag of image the Dockerfile creates
image: mccarthysean/ijack:timescale
depends_on:
- timescale13
env_file: .env
environment:
POSTGRES_HOST: timescale13
networks:
- traefik-public
deploy:
# Either global (exactly one container per physical node) or
# replicated (a specified number of containers). The default is replicated
mode: replicated
# For stateless applications using "replicated" mode,
# the total number of replicas to create
replicas: 2
restart_policy:
on-failure # default is 'any'
timescale13:
image: timescale/timescaledb:2.3.0-pg13
volumes:
- type: volume
source: ijack-timescale-db-pg13
target: /var/lib/postgresql/data # the location in the container where the data are stored
read_only: false
# Custom postgresql.conf file will be mounted (see command: as well)
- type: bind
source: ./postgresql_custom.conf
target: /postgresql_custom.conf
read_only: false
env_file: .env
command: ["-c", "config_file=/postgresql_custom.conf"]
ports:
- 0.0.0.0:5432:5432
networks:
traefik-public:
deploy:
# Either global (exactly one container per physical node) or
# replicated (a specified number of containers). The default is replicated
mode: replicated
# For stateless applications using "replicated" mode,
# the total number of replicas to create
replicas: 1
placement:
constraints:
# Since this is for the stateful database,
# only run it on the swarm manager, not on workers
- "node.role==manager"
restart_policy:
condition: on-failure # default is 'any'
# Use a named external volume to persist our data
volumes:
ijack-timescale-db-pg13:
external: true
networks:
# Use the previously created public network "traefik-public", shared with other
# services that need to be publicly available via this Traefik
traefik-public:
external: true
我用来构建“inserter.py”容器镜像的“Docker-compose.build.yml”文件:
version: '3.7'
services:
inserter:
# Name and tag of image the Dockerfile creates
image: mccarthysean/ijack:timescale
build:
# context: where should docker-compose look for the Dockerfile?
# i.e. either a path to a directory containing a Dockerfile, or a url to a git repository
context: .
dockerfile: Dockerfile.inserter
environment:
POSTGRES_HOST: timescale
Bash 脚本 I 运行,它使用 Docker Swarm 构建、推送和部署数据库和插入器容器:
#!/bin/bash
# Build and tag image locally in one step.
# No need for docker tag <image> mccarthysean/ijack:<tag>
echo ""
echo "Building the image locally..."
echo "docker-compose -f docker-compose.build.yml build"
docker-compose -f docker-compose.build.yml build
# Push to Docker Hub
# docker login --username=mccarthysean
echo ""
echo "Pushing the image to Docker Hub..."
echo "docker push mccarthysean/ijack:timescale"
docker push mccarthysean/ijack:timescale
# Deploy to the Docker swarm and send login credentials
# to other nodes in the swarm with "--with-registry-auth"
echo ""
echo "Deploying to the Docker swarm..."
echo "docker stack deploy --with-registry-auth -c docker-compose.prod13.yml timescale13"
docker stack deploy --with-registry-auth -c docker-compose.prod13.yml timescale13
当 Python 插入程序失败时(可能是数据库连接问题,或其他原因),它会向我发送一封电子邮件警报,然后引发错误并失败。在这一点上,我希望 Docker 容器失败并用 Docker Swarm 的 restart_policy: on-failure
重新启动。但是,在出错后,当我键入 docker service ls
时,我看到以下内容 0/2 replicas
:
ID NAME MODE REPLICAS IMAGE PORTS
u354h0uj4ug6 timescale13_inserter13 replicated 0/2 mccarthysean/ijack:timescale
o0rbfx5n2z4h timescale13_timescale13 replicated 1/1 timescale/timescaledb:2.3.0-pg13 *:5432->5432/tcp
当它健康时(大部分时间),它会显示 2/2
个副本。为什么我的容器没有失败然后被 Docker Swarm 重新启动?
我想通了,并更新了我的问题以提供有关我的 try: except:
失败例程的更多详细信息。
这是发生的错误(实际上是依次发生的两个错误,如您所见):
Here's the error information:
Traceback (most recent call last):
File "/inserter/inserter.py", line 357, in execute_sql
cursor.execute(sql, values)
psycopg2.errors.AdminShutdown: terminating connection due to administrator command SSL connection has been closed unexpectedly
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/inserter/inserter.py", line 911, in main
insert_alarm_log_rds(
File "/inserter/inserter.py", line 620, in insert_alarm_log_rds
rc = execute_sql(
File "/inserter/inserter.py", line 364, in execute_sql
conn.rollback()
psycopg2.InterfaceError: connection already closed
如您所见,首先出现了一个 psycopg2.errors.AdminShutdown
错误,这是我第一个 引发的 try: except:
常规。然而,这之后是 second psycopg2.InterfaceError
,它实际上发生在我的 finally:
清理代码中,然后是 pass
语句和一个 return 0
,所以我猜之前的错误没有重新出现,并且代码以错误代码 0 而不是 non- 结束零刺激重启。
@edijon 关于需要非零退出代码的评论帮助我解决了这个问题。
我需要重新提出 finally:
例程中的错误,如下所示:
def main():
try:
conn = get_db_conn()
insert_data(conn)
except Exception:
logger.exception("Error with main inserter.py function")
send_email_if_error()
raise
finally:
try:
conn.close()
del conn
except Exception:
# previously the following was just 'pass'
# and I changed it to 'raise' to ensure errors
# cause a non-zero error code for Docker's 'restart_policy'
raise
# The following was previously "return 0"
# which caused the container not to restart...
# Either comment it out, or change it to return non-zero
return 1
if __name__ == "__main__":
main()