用于 kubernetes 的 Stonith
Stonith for kubernetes
Kubernetes 是否支持硬件节点的 STONITH 操作?我们有智能电源插座,允许 API 连接 'power off server',它们与起搏器配合使用效果很好。
Kubernetes 是否支持 STONITH?
中提到了 STONITH
STONITH ("Shoot The Other Node In The Head" or "Shoot The Offending Node In The Head"), sometimes called STOMITH ("Shoot The Other Member/Machine In The Head"), is a technique for fencing in computer clusters.1
Fencing is the isolation of a failed node so that it does not cause disruption to a computer cluster. As its name suggests, STONITH fences failed nodes by resetting or powering down the failed node.
实际上在kubernetes/kops issue 2002
中讨论过
I think we should take a look at the autoscaler and I think we could default to Reboot, perhaps configurable in the manifest to AllowTermination.
但目前这已经过时了。
这在kubernetes/community/contributors/design-proposals/storage/pod-safety.md
中也有描述
In order to reconcile partitions, an actor (human or automated) must decide when the partition is unrecoverable. The actor may be informed of the failure in an unambiguous way (e.g. the node was destroyed by a meteor) allowing for certainty that the processes on that node are terminated, and thus may resolve the partition by deleting the node and the pods on the node.
Alternatively, the actor may take steps to ensure the partitioned node cannot return to the cluster or access shared resources - this is known as fencing and is a well understood domain.
Kubernetes 是否支持硬件节点的 STONITH 操作?我们有智能电源插座,允许 API 连接 'power off server',它们与起搏器配合使用效果很好。
Kubernetes 是否支持 STONITH?
STONITH ("Shoot The Other Node In The Head" or "Shoot The Offending Node In The Head"), sometimes called STOMITH ("Shoot The Other Member/Machine In The Head"), is a technique for fencing in computer clusters.1
Fencing is the isolation of a failed node so that it does not cause disruption to a computer cluster. As its name suggests, STONITH fences failed nodes by resetting or powering down the failed node.
实际上在kubernetes/kops issue 2002
中讨论过I think we should take a look at the autoscaler and I think we could default to Reboot, perhaps configurable in the manifest to AllowTermination.
但目前这已经过时了。
这在kubernetes/community/contributors/design-proposals/storage/pod-safety.md
中也有描述In order to reconcile partitions, an actor (human or automated) must decide when the partition is unrecoverable. The actor may be informed of the failure in an unambiguous way (e.g. the node was destroyed by a meteor) allowing for certainty that the processes on that node are terminated, and thus may resolve the partition by deleting the node and the pods on the node.
Alternatively, the actor may take steps to ensure the partitioned node cannot return to the cluster or access shared resources - this is known as fencing and is a well understood domain.