为什么失败的 startupProbe 没有杀死 Pod 而是允许它 运行?
Why is a failed startupProbe not killing the Pod but allowing it to run?
我创建了一个启动探测器并使其总是失败。它应该会导致 pod 被杀死并重新启动,但事实并非如此。我看到启动探测的一个事件失败 (此后没有事件),但 pods 显示为 1/1 Running
。当我 运行 我的 Helm 测试时,它通过了!
我通过为启动探测检查设置无效的用户名和密码来保证失败。
使用K8s版本:1.19.4
当我检查事件时,我得到:
4m44s Normal SuccessfulCreate replicaset/mysqlpod-5957645967 Created pod: mysqlpod-5957645967-fj95t
4m44s Normal ScalingReplicaSet deployment/mysqlpod Scaled up replica set mysqlpod-5957645967 to 1
4m44s Normal Scheduled pod/mysqlpod-5957645967-fj95t Successfully assigned data-layer/mysqlpod-5957645967-fj95t to minikube
4m43s Normal Created pod/mysqlpod-5957645967-fj95t Created container mysql
4m43s Normal Pulled pod/mysqlpod-5957645967-fj95t Container image "mysql:5.6" already present on machine
4m43s Normal Started pod/mysqlpod-5957645967-fj95t Started container mysql
4m41s Warning Unhealthy pod/mysqlpod-5957645967-fj95t Startup probe failed: Warning: Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
检查 Pods,我看到(使用 --watch
):
NAME READY STATUS RESTARTS AGE
mysql-db-app-5957645967-fj95t 0/1 Running 0 7m18s
mysql-db-app-5957645967-fj95t 1/1 Running 0 7m43s
请注意它 零 重新启动。
我的部署有:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "mysqlapp.name" . }}
namespace: {{ quote .Values.metadata.namespace }}
spec:
replicas: {{ .Values.deploymentSpecs.replicas}}
selector:
matchLabels:
{{- include "mysqlapp.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "mysqlapp.selectorLabels" . | nindent 8 }}
spec:
containers:
- image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
name: {{ .Values.image.name }}
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: db-password
ports:
- containerPort: {{ .Values.ports.containerPort }}
name: {{ .Values.image.name }}
startupProbe:
exec:
command:
- /bin/sh
- -c
- mysqladmin ping -u wrong -pwrong
periodSeconds: {{ .Values.startupProbe.periodSeconds }}
timeoutSeconds: {{ .Values.startupProbe.timeoutSeconds }}
successThreshold: {{ .Values.startupProbe.successThreshold }}
failureThreshold: {{ .Values.startupProbe.failureThreshold }}
注意上面的- mysqladmin ping -u wrong -pwrong
。
Values.yaml:
metadata:
namespace: data-layer
myprop: value
deploymentSpecs:
replicas: 1
labels:
app: db-service
image:
name: mysql
pullPolicy: IfNotPresent
tag: "5.6"
ports:
containerPort: 3306
startupProbe:
periodSeconds: 10
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 5
即使等了 5 分钟,我仍然能够 运行 测试(它使用 MySql 客户端访问数据库)并且成功了!为什么这不会失败?
它没有失败,因为它证明 ping
命令 returns 一个 0
状态 即使 user/pass 是错误的,只要它可以到达服务器。
Check whether the server is available. The return status from mysqladmin is 0 if the server is running, 1 if it is not. This is 0 even in case of an error such as Access denied, because this means that the server is running but refused the connection, which is different from the server not running.
要强制失败并重新启动,您可以使用:
mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD} --host fake
我创建了一个启动探测器并使其总是失败。它应该会导致 pod 被杀死并重新启动,但事实并非如此。我看到启动探测的一个事件失败 (此后没有事件),但 pods 显示为 1/1 Running
。当我 运行 我的 Helm 测试时,它通过了!
我通过为启动探测检查设置无效的用户名和密码来保证失败。
使用K8s版本:1.19.4
当我检查事件时,我得到:
4m44s Normal SuccessfulCreate replicaset/mysqlpod-5957645967 Created pod: mysqlpod-5957645967-fj95t
4m44s Normal ScalingReplicaSet deployment/mysqlpod Scaled up replica set mysqlpod-5957645967 to 1
4m44s Normal Scheduled pod/mysqlpod-5957645967-fj95t Successfully assigned data-layer/mysqlpod-5957645967-fj95t to minikube
4m43s Normal Created pod/mysqlpod-5957645967-fj95t Created container mysql
4m43s Normal Pulled pod/mysqlpod-5957645967-fj95t Container image "mysql:5.6" already present on machine
4m43s Normal Started pod/mysqlpod-5957645967-fj95t Started container mysql
4m41s Warning Unhealthy pod/mysqlpod-5957645967-fj95t Startup probe failed: Warning: Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
检查 Pods,我看到(使用 --watch
):
NAME READY STATUS RESTARTS AGE
mysql-db-app-5957645967-fj95t 0/1 Running 0 7m18s
mysql-db-app-5957645967-fj95t 1/1 Running 0 7m43s
请注意它 零 重新启动。
我的部署有:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "mysqlapp.name" . }}
namespace: {{ quote .Values.metadata.namespace }}
spec:
replicas: {{ .Values.deploymentSpecs.replicas}}
selector:
matchLabels:
{{- include "mysqlapp.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "mysqlapp.selectorLabels" . | nindent 8 }}
spec:
containers:
- image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
name: {{ .Values.image.name }}
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: db-password
ports:
- containerPort: {{ .Values.ports.containerPort }}
name: {{ .Values.image.name }}
startupProbe:
exec:
command:
- /bin/sh
- -c
- mysqladmin ping -u wrong -pwrong
periodSeconds: {{ .Values.startupProbe.periodSeconds }}
timeoutSeconds: {{ .Values.startupProbe.timeoutSeconds }}
successThreshold: {{ .Values.startupProbe.successThreshold }}
failureThreshold: {{ .Values.startupProbe.failureThreshold }}
注意上面的- mysqladmin ping -u wrong -pwrong
。
Values.yaml:
metadata:
namespace: data-layer
myprop: value
deploymentSpecs:
replicas: 1
labels:
app: db-service
image:
name: mysql
pullPolicy: IfNotPresent
tag: "5.6"
ports:
containerPort: 3306
startupProbe:
periodSeconds: 10
timeoutSeconds: 2
successThreshold: 1
failureThreshold: 5
即使等了 5 分钟,我仍然能够 运行 测试(它使用 MySql 客户端访问数据库)并且成功了!为什么这不会失败?
它没有失败,因为它证明 ping
命令 returns 一个 0
状态 即使 user/pass 是错误的,只要它可以到达服务器。
Check whether the server is available. The return status from mysqladmin is 0 if the server is running, 1 if it is not. This is 0 even in case of an error such as Access denied, because this means that the server is running but refused the connection, which is different from the server not running.
要强制失败并重新启动,您可以使用:
mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD} --host fake