删除 pod 时创建了超过指定数量的 pods

Question

我已经使用 Helm chart 在 AWS 上的 Kubernetes 集群中部署了 Hashicorp 的 Vault。

部署中的副本数指定为3。

在这 3 个 pods 中，1 已就绪 (1/1)，而另外两个副本 pods 尚未就绪 (0/1)。我杀死了就绪的 pod，虽然预计 Kubernetes 会部署一个新的 pod 来替换它，但它部署了两个新的 pods。

现在我有两个准备好 pods，两个还没有准备好 pods。在删除其中一个 pods 时，现在 Kubernetes 只重新创建一个 pod。因此，我的保管库部署使用 4 而不是 3 pods。这背后的原因可能是什么？我们该如何预防？

Answer 1

当你遇到这样的问题时，你应该

kubectl describe pod <PROBLEMATIC_POD>

并查看输出的下半部分 Events。

您 pods 没有启动的一些原因可能是：

没有可用节点为您的请求提供足够的资源
没有可用的卷
一些反亲和性规则和节点不足，因此调度程序无法将节点分配给您的 pods。

Answer 2

您的部署无法正常工作，因为在使用 s3 存储后端时 HA（高可用性）不可用。您将需要 Hashicorp 的 Consul 或 AWS 的 DynamoDB，或不同的后端提供商。如果您坚持使用 s3 后端提供程序，请将副本数更改为 1。

至于为什么您看到 4 pods 而不是 3，您需要提供更多详细信息。粘贴 kubectl get pods -l app=vault 和 kubectl describe deploy -l app=vault 的输出，我将更新此答案。

我只能推测它的价值。对于 Deployment 对象，有一个 maxSurge 属性允许滚动更新以扩展到超出所需副本数的范围。它默认为 25%，四舍五入，在您的情况下将是额外的 1 个 pod。

Max Surge

.spec.strategy.rollingUpdate.maxSurge is an optional field that specifies the maximum number of Pods that can be created over the desired number of Pods. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The value cannot be 0 if MaxUnavailable is 0. The absolute number is calculated from the percentage by rounding up. The default value is 25%.

For example, when this value is set to 30%, the new ReplicaSet can be scaled up immediately when the rolling update starts, such that the total number of old and new Pods does not exceed 130% of desired Pods. Once old Pods have been killed, the new ReplicaSet can be scaled up further, ensuring that the total number of Pods running at any time during the update is at most 130% of desired Pods.

删除一个 Running (1/1) pod 以及另一个 pods 的 NotReady 状态可能会使您的 Deployment 进入 "rolling update" 或类似的状态，这允许您的部署扩展到其 maxSurge 设置。

删除 pod 时创建了超过指定数量的 pods

More than specified number of pods created on deleting a pod

kubernetes

hashicorp-vault

kubernetes-helm