如何查看docker-compose healthcheck日志?
How to view docker-compose healthcheck logs?
在我的 docker-compose.yml
中,我有以下 service
healthcheck
部分。我想知道 MariaDB 是否真的准备好处理查询。名为 cmd
的 service
配置为依赖于 condition: service_healthy
。
db:
image: mariadb:10
environment:
MYSQL_RANDOM_ROOT_PASSWORD: 1
MYSQL_USER: user
MYSQL_PASSWORD: password
MYSQL_DATABASE: database
healthcheck:
test: ["CMD", "mysql", "--user=user", "--password=password", "--execute='SELECT 1'", "--host=127.0.0.1", "--port=3306"]
interval: 1s
retries: 30
此健康检查不起作用,表明该服务不健康。
如何检查 test
CMD 的输出?
您可以使用:
docker inspect --format "{{json .State.Health }}" <container name> | jq
输出:
{
"Status": "unhealthy",
"FailingStreak": 63,
"Log": [
{
"Start": "2017-03-11T20:49:19.668895201+03:30",
"End": "2017-03-11T20:49:19.735722044+03:30",
"ExitCode": 1,
"Output": "ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ''SELECT 1'' at line 1\n"
}
]
}
并查找 output 部分。
仅获取输出:
docker inspect --format "{{json .State.Health }}" mariadb_db_1 | jq '.Log[].Output'
群模式使用以下格式(感谢@shotgunner 指出):
{{json.Spec.TaskTemplate.ContainerSpec.Healthcheck}}
随意将 jq
换成您用于 json 漂亮打印的任何工具。
docker-compose ps
将指示每个服务的状态,如果定义了健康检查,则包括其健康状况。这对于基本概述很有用。
% docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------------------------------
remix-theme-editor_analytics_1 /bin/sh -c /analytics/run. ... Up
remix-theme-editor_base_1 /bin/bash Exit 0
remix-theme-editor_flower_1 /entrypoint --environment ... Exit 137
remix-theme-editor_frontend_1 /bin/sh -c perl -p -i -e ' ... Exit 137
remix-theme-editor_js-app_1 npm run Exit 0
remix-theme-editor_mq_1 docker-entrypoint.sh rabbi ... Up (healthy) 15671/tcp, 15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
remix-theme-editor_mysql-migration_1 /entrypoint_mysql-migratio ... Exit 0
remix-theme-editor_mysql_1 /bin/sh -c /entrypoint_wra ... Up (health: starting) 127.0.0.2:3308->3306/tcp
remix-theme-editor_page-renderer_1 npm run start:watch Up
remix-theme-editor_python-app_1 /entrypoint Exit 2
remix-theme-editor_redis_1 docker-entrypoint.sh /bin/ ... Up (health: starting) 6379/tcp
remix-theme-editor_scheduler_1 /entrypoint --environment ... Exit 137
remix-theme-editor_socket_1 /entrypoint --environment ... Exit 1
remix-theme-editor_static-builder_1 npm run watch Up
remix-theme-editor_static-http_1 nginx -g daemon off; Up 127.0.0.2:6544->443/tcp, 80/tcp
remix-theme-editor_web_1 /entrypoint --environment ... Exit 1
remix-theme-editor_worker_1 /entrypoint --environment ... Exit 1
remix-theme-editor_worker_screenshots_1 /entrypoint --environment ... Exit 1
如果您需要更多详细信息,请结合使用 docker inspect
和 docker ps -q <service-name>
。
% docker inspect --format "{{json .State.Health }}" $(docker-compose ps -q mq) | jq
{
"Status": "starting",
"FailingStreak": 48,
"Log": [
{
"Start": "2018-10-03T00:40:18.671527745-05:00",
"End": "2018-10-03T00:40:18.71729051-05:00",
"ExitCode": -1,
"Output": "OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"exec: \\"nc\\": executable file not found in $PATH\": unknown"
},
...
您始终可以通过简单地自己执行健康检查代码来自行调试健康检查。例如:
% docker exec -it $(docker-compose ps -q socket) nc -w2 127.0.0.1 5672
(UNKNOWN) [127.0.0.1] 5672 (?) : Connection refused
您也可以在 shell 中执行相同的操作:
% docker exec -it $(docker-compose ps -q socket) bash
root@b5da5207d344:~/src# nc -w2 127.0.0.1 5672
(UNKNOWN) [127.0.0.1] 5672 (?) : Connection refused
root@b5da5207d344:~/src# echo $?
1
最后,您可以简单地在第一个终端 window 中使用 docker-compose up
,在另一个终端中使用 docker-compose logs -f
。这将显示来自 docker-compose-managed 容器的所有日志。
在集群模式下:
- 首先在管理器中使用
docker service ps service_name
找到失败的任务id和对应的节点
manager$ docker service ps service_name
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
liwww3qzg9dz service_name.1 image_name:1.3 s-3 Running Running 27 seconds ago
hcgxmwk2efj0 \_ service_name.1 image_name:1.3 s-3 Shutdown Failed about a minute ago "task: non-zero exit (137): do…"
本例中,hcgxmwk2efj0
是任务id,s-3
是节点名。
- 然后在manager中使用
docker inspect --format "{{json .Status.ContainerStatus.ContainerID }}" task_id
获取容器id
manager$ docker inspect --format "{{json .Status.ContainerStatus.ContainerID }}" hcgxmwk2efj0
"412b09d5244047b31471248fd9a0807e5ea42406fb8f5b1701df2244933e30c8"
- 然后ssh到那个节点,使用命令
docker inspect --format "{{json .State.Health }}" container_id | jq
获取healthcheck的日志。 (| jq
在此命令中不是必需的)
s-3$ docker inspect --format "{{json .State.Health }}" 412b09d5244047b31471248fd9a0807e5ea42406fb8f5b1701df2244933e30c8 | jq
{
"Status": "unhealthy",
"FailingStreak": 3,
"Log": [
{
"Start": "2021-09-07T06:10:05.233163051Z",
"End": "2021-09-07T06:10:07.585487343Z",
"ExitCode": 0,
"Output": "... log 1 ..."
},
{
"Start": "2021-09-07T06:10:37.644936244Z",
"End": "2021-09-07T06:10:39.881196276Z",
"ExitCode": 0,
"Output": "... log 2 ..."
},
{
"Start": "2021-09-07T06:11:10.16172012Z",
"End": "2021-09-07T06:11:25.161912411Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
},
{
"Start": "2021-09-07T06:11:55.297395088Z",
"End": "2021-09-07T06:12:10.302928565Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
},
{
"Start": "2021-09-07T06:12:40.371234778Z",
"End": "2021-09-07T06:12:55.371393914Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
}
]
}
在我的 docker-compose.yml
中,我有以下 service
healthcheck
部分。我想知道 MariaDB 是否真的准备好处理查询。名为 cmd
的 service
配置为依赖于 condition: service_healthy
。
db:
image: mariadb:10
environment:
MYSQL_RANDOM_ROOT_PASSWORD: 1
MYSQL_USER: user
MYSQL_PASSWORD: password
MYSQL_DATABASE: database
healthcheck:
test: ["CMD", "mysql", "--user=user", "--password=password", "--execute='SELECT 1'", "--host=127.0.0.1", "--port=3306"]
interval: 1s
retries: 30
此健康检查不起作用,表明该服务不健康。
如何检查 test
CMD 的输出?
您可以使用:
docker inspect --format "{{json .State.Health }}" <container name> | jq
输出:
{
"Status": "unhealthy",
"FailingStreak": 63,
"Log": [
{
"Start": "2017-03-11T20:49:19.668895201+03:30",
"End": "2017-03-11T20:49:19.735722044+03:30",
"ExitCode": 1,
"Output": "ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ''SELECT 1'' at line 1\n"
}
]
}
并查找 output 部分。
仅获取输出:
docker inspect --format "{{json .State.Health }}" mariadb_db_1 | jq '.Log[].Output'
群模式使用以下格式(感谢@shotgunner 指出):
{{json.Spec.TaskTemplate.ContainerSpec.Healthcheck}}
随意将 jq
换成您用于 json 漂亮打印的任何工具。
docker-compose ps
将指示每个服务的状态,如果定义了健康检查,则包括其健康状况。这对于基本概述很有用。
% docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------------------------------
remix-theme-editor_analytics_1 /bin/sh -c /analytics/run. ... Up
remix-theme-editor_base_1 /bin/bash Exit 0
remix-theme-editor_flower_1 /entrypoint --environment ... Exit 137
remix-theme-editor_frontend_1 /bin/sh -c perl -p -i -e ' ... Exit 137
remix-theme-editor_js-app_1 npm run Exit 0
remix-theme-editor_mq_1 docker-entrypoint.sh rabbi ... Up (healthy) 15671/tcp, 15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
remix-theme-editor_mysql-migration_1 /entrypoint_mysql-migratio ... Exit 0
remix-theme-editor_mysql_1 /bin/sh -c /entrypoint_wra ... Up (health: starting) 127.0.0.2:3308->3306/tcp
remix-theme-editor_page-renderer_1 npm run start:watch Up
remix-theme-editor_python-app_1 /entrypoint Exit 2
remix-theme-editor_redis_1 docker-entrypoint.sh /bin/ ... Up (health: starting) 6379/tcp
remix-theme-editor_scheduler_1 /entrypoint --environment ... Exit 137
remix-theme-editor_socket_1 /entrypoint --environment ... Exit 1
remix-theme-editor_static-builder_1 npm run watch Up
remix-theme-editor_static-http_1 nginx -g daemon off; Up 127.0.0.2:6544->443/tcp, 80/tcp
remix-theme-editor_web_1 /entrypoint --environment ... Exit 1
remix-theme-editor_worker_1 /entrypoint --environment ... Exit 1
remix-theme-editor_worker_screenshots_1 /entrypoint --environment ... Exit 1
如果您需要更多详细信息,请结合使用 docker inspect
和 docker ps -q <service-name>
。
% docker inspect --format "{{json .State.Health }}" $(docker-compose ps -q mq) | jq
{
"Status": "starting",
"FailingStreak": 48,
"Log": [
{
"Start": "2018-10-03T00:40:18.671527745-05:00",
"End": "2018-10-03T00:40:18.71729051-05:00",
"ExitCode": -1,
"Output": "OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"exec: \\"nc\\": executable file not found in $PATH\": unknown"
},
...
您始终可以通过简单地自己执行健康检查代码来自行调试健康检查。例如:
% docker exec -it $(docker-compose ps -q socket) nc -w2 127.0.0.1 5672
(UNKNOWN) [127.0.0.1] 5672 (?) : Connection refused
您也可以在 shell 中执行相同的操作:
% docker exec -it $(docker-compose ps -q socket) bash
root@b5da5207d344:~/src# nc -w2 127.0.0.1 5672
(UNKNOWN) [127.0.0.1] 5672 (?) : Connection refused
root@b5da5207d344:~/src# echo $?
1
最后,您可以简单地在第一个终端 window 中使用 docker-compose up
,在另一个终端中使用 docker-compose logs -f
。这将显示来自 docker-compose-managed 容器的所有日志。
在集群模式下:
- 首先在管理器中使用
docker service ps service_name
找到失败的任务id和对应的节点
manager$ docker service ps service_name
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
liwww3qzg9dz service_name.1 image_name:1.3 s-3 Running Running 27 seconds ago
hcgxmwk2efj0 \_ service_name.1 image_name:1.3 s-3 Shutdown Failed about a minute ago "task: non-zero exit (137): do…"
本例中,hcgxmwk2efj0
是任务id,s-3
是节点名。
- 然后在manager中使用
docker inspect --format "{{json .Status.ContainerStatus.ContainerID }}" task_id
获取容器id
manager$ docker inspect --format "{{json .Status.ContainerStatus.ContainerID }}" hcgxmwk2efj0
"412b09d5244047b31471248fd9a0807e5ea42406fb8f5b1701df2244933e30c8"
- 然后ssh到那个节点,使用命令
docker inspect --format "{{json .State.Health }}" container_id | jq
获取healthcheck的日志。 (| jq
在此命令中不是必需的)
s-3$ docker inspect --format "{{json .State.Health }}" 412b09d5244047b31471248fd9a0807e5ea42406fb8f5b1701df2244933e30c8 | jq
{
"Status": "unhealthy",
"FailingStreak": 3,
"Log": [
{
"Start": "2021-09-07T06:10:05.233163051Z",
"End": "2021-09-07T06:10:07.585487343Z",
"ExitCode": 0,
"Output": "... log 1 ..."
},
{
"Start": "2021-09-07T06:10:37.644936244Z",
"End": "2021-09-07T06:10:39.881196276Z",
"ExitCode": 0,
"Output": "... log 2 ..."
},
{
"Start": "2021-09-07T06:11:10.16172012Z",
"End": "2021-09-07T06:11:25.161912411Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
},
{
"Start": "2021-09-07T06:11:55.297395088Z",
"End": "2021-09-07T06:12:10.302928565Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
},
{
"Start": "2021-09-07T06:12:40.371234778Z",
"End": "2021-09-07T06:12:55.371393914Z",
"ExitCode": -1,
"Output": "Health check exceeded timeout (15s)"
}
]
}