Kubernetes 活性探测失败是自愿中断还是非自愿中断?
Are Kubernetes liveness probe failures voluntary or involuntary disruptions?
我有一个部署到 Kubernetes 的应用程序依赖于外部应用程序。有时这两者之间的连接会进入无效状态,这只能通过重新启动我的应用程序来解决。
为了自动重启,我配置了一个活动探测器来验证连接。
这一直很好用,但是,我担心如果外部应用程序出现故障(这样连接错误不仅仅是由于无效的 pod 状态),我的所有 pods将立即重新启动,我的应用程序将变得完全不可用。我希望它保持 运行 以便不依赖于不良服务的功能可以继续。
我想知道 pod 中断预算是否会阻止这种情况,因为它限制了 pods 由于“自愿”中断而下降的数量。但是,K8s 文档没有说明 liveness 探测失败是否是自愿中断。是吗?
I'm wondering if a pod disruption budget would prevent this scenario.
是,会阻止。
如您所述,当 pod
出现故障(或节点故障)时,pods 无法避免变得不可用。但是,某些服务要求最小数量 pods 始终保持 运行。
可能还有另一种方法 (Stateful resource
),但它是可用的最简单的 Kubernetes 资源之一。
注意:您还可以在 minAvailable
字段中使用百分比而不是绝对数字。例如,您可以声明带有 app=run-always
标签的所有 pods 中的 60%
需要始终为 运行。
根据文档,我会说:
Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM disappear
- a kernel panic
- the node disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
- deleting the deployment or other controller that manages the pod
- updating a deployment's pod template causing a restart
- directly deleting a pod (e.g. by accident)
Cluster administrator actions include:
- Draining a node for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
- Removing a pod from a node to permit something else to fit on that node.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions
所以你的例子很不一样,据我所知,这既不是自愿中断也不是非自愿中断。
同时查看另一个 Kubernetes 文档:
Pod lifetime
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.
Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime
Container probes
The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a livenessProbe
):
livenessProbe
: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success
.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes
When should you use a liveness probe?
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy
.
If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy
of Always or OnFailure.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe
根据这些信息,最好创建自定义 liveness probe,它应该考虑内部进程健康检查和外部依赖(liveness)健康检查。在第一种情况下,您的容器应该 stop/terminate 您的流程与第二种情况不同,具有外部依赖性。
回答以下问题:
I'm wondering if a pod disruption budget would prevent this scenario.
在这种特殊情况下,PDB 将无济于事。
我想提高评论的知名度,我在这个问题上使用的额外资源可能对其他社区成员有用:
正在使用 PodDisruptionBudget 进行测试。
Pod 仍会同时重启。
例子
https://github.com/AlphaWong/PodDisruptionBudgetAndPodProbe
是的。像@Dawid Kruk 你应该创建一个自定义脚本,如下所示
# something like this
livenessProbe:
exec:
command:
- /bin/sh
- -c
# generate a random number for sleep
- 'SLEEP_TIME=$(shuf -i 2-40 -n 1);sleep $SLEEP_TIME; curl -L --max-time 5 -f nginx2.default.svc.cluster.local'
initialDelaySeconds: 10
# think about the gap between each call
periodSeconds: 30
# it is required after k8s v1.12
timeoutSeconds: 90
我有一个部署到 Kubernetes 的应用程序依赖于外部应用程序。有时这两者之间的连接会进入无效状态,这只能通过重新启动我的应用程序来解决。
为了自动重启,我配置了一个活动探测器来验证连接。
这一直很好用,但是,我担心如果外部应用程序出现故障(这样连接错误不仅仅是由于无效的 pod 状态),我的所有 pods将立即重新启动,我的应用程序将变得完全不可用。我希望它保持 运行 以便不依赖于不良服务的功能可以继续。
我想知道 pod 中断预算是否会阻止这种情况,因为它限制了 pods 由于“自愿”中断而下降的数量。但是,K8s 文档没有说明 liveness 探测失败是否是自愿中断。是吗?
I'm wondering if a pod disruption budget would prevent this scenario.
是,会阻止。
如您所述,当 pod
出现故障(或节点故障)时,pods 无法避免变得不可用。但是,某些服务要求最小数量 pods 始终保持 运行。
可能还有另一种方法 (Stateful resource
),但它是可用的最简单的 Kubernetes 资源之一。
注意:您还可以在 minAvailable
字段中使用百分比而不是绝对数字。例如,您可以声明带有 app=run-always
标签的所有 pods 中的 60%
需要始终为 运行。
根据文档,我会说:
Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM disappear
- a kernel panic
- the node disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
- deleting the deployment or other controller that manages the pod
- updating a deployment's pod template causing a restart
- directly deleting a pod (e.g. by accident)
Cluster administrator actions include:
- Draining a node for repair or upgrade.
- Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
- Removing a pod from a node to permit something else to fit on that node.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions
所以你的例子很不一样,据我所知,这既不是自愿中断也不是非自愿中断。
同时查看另一个 Kubernetes 文档:
Pod lifetime
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.
Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime
Container probes
The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a
livenessProbe
):
livenessProbe
: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state isSuccess
.-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes
When should you use a liveness probe?
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's
restartPolicy
.If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a
restartPolicy
of Always or OnFailure.-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe
根据这些信息,最好创建自定义 liveness probe,它应该考虑内部进程健康检查和外部依赖(liveness)健康检查。在第一种情况下,您的容器应该 stop/terminate 您的流程与第二种情况不同,具有外部依赖性。
回答以下问题:
I'm wondering if a pod disruption budget would prevent this scenario.
在这种特殊情况下,PDB 将无济于事。
我想提高评论的知名度,我在这个问题上使用的额外资源可能对其他社区成员有用:
正在使用 PodDisruptionBudget 进行测试。 Pod 仍会同时重启。
例子
https://github.com/AlphaWong/PodDisruptionBudgetAndPodProbe
是的。像@Dawid Kruk 你应该创建一个自定义脚本,如下所示
# something like this
livenessProbe:
exec:
command:
- /bin/sh
- -c
# generate a random number for sleep
- 'SLEEP_TIME=$(shuf -i 2-40 -n 1);sleep $SLEEP_TIME; curl -L --max-time 5 -f nginx2.default.svc.cluster.local'
initialDelaySeconds: 10
# think about the gap between each call
periodSeconds: 30
# it is required after k8s v1.12
timeoutSeconds: 90