由于调度失败，Pod 处于 pending 状态

Question

我是 Kubernetes 的新手，我正在尝试进行部署运行。

在我推送部署配置后，创建了副本集，并且将创建 pod。但是 pod 保持 Pending 状态。

pod 列出了一个无法安排的事件，因为没有可用的节点。输出 kubectl describe pod foo-qa-1616599440:

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  25m    default-scheduler  0/6 nodes are available: 6 Insufficient pods.
  Warning  FailedScheduling  18m    default-scheduler  0/6 nodes are available: 6 Insufficient pods.
  Warning  FailedScheduling  11m    default-scheduler  0/6 nodes are available: 6 Insufficient pods.
  Warning  FailedScheduling  5m18s  default-scheduler  0/6 nodes are available: 6 Insufficient pods.

但是有节点可用。 kubectl get nodes 的输出：

NAME                                             STATUS   ROLES    AGE   VERSION
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   64d   v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   64d   v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   54d   v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   64d   v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   54d   v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal   Ready    <none>   64d   v1.17.12-eks-7684af

我注意到的另一件事是，正在创建许多相同的工作，并且所有工作的状态都是 Pending。我不知道这是否是正常行为，但有超过 200 个，并且还在增加。输出 kubectl get jobs:

...
cron-foo-qa-1616598720           0/1           17m        17m
cron-foo-qa-1616598780           0/1           16m        16m
cron-foo-qa-1616598840           0/1           15m        15m
cron-foo-qa-1616598900           0/1           14m        14m
cron-foo-qa-1616598960           0/1           13m        13m
cron-foo-qa-1616599020           0/1           12m        12m
cron-foo-qa-1616599080           0/1           11m        11m
cron-foo-qa-1616599200           0/1           9m2s       9m2s
cron-foo-qa-1616599260           0/1           8m4s       8m4s
cron-foo-qa-1616599320           0/1           7m7s       7m7s
cron-foo-qa-1616599380           0/1           6m11s      6m12s
cron-foo-qa-1616599440           0/1           5m1s       5m1s
cron-foo-qa-1616599500           0/1           4m4s       4m4s
cron-foo-qa-1616599560           0/1           3m6s       3m6s
cron-foo-qa-1616599620           0/1           2m10s      2m10s
cron-foo-qa-1616599680           0/1           74s        74s
cron-foo-qa-1616599740           0/1                      2s

如果我在检查事件列表时是正确的，我会看到一些安排正在发生。 kubectl get events --sort-by='.metadata.creationTimestamp' 的输出：

...
3s          Warning   FailedScheduling                  pod/cron-foobar-prod-1616590260-vwqsk           0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-acc-1616590260-j29vx            0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-prod-1616569560-g8mn2           0/6 nodes are available: 6 Insufficient pods.
3s          Normal    Scheduled                         pod/cron-foobar-acc-1616560380-6x88z            Successfully assigned middleware/cron-foobar-acc-1616560380-6x88z to ip-xxx-xxx-xxx-xxx.eu-central-1.compute.internal
3s          Warning   FailedScheduling                  pod/cron-foobar-prod-1616596560-hx895         0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-prod-1616598180-vwls2         0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-qa-1616536260-vh7bl           0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-acc-1616571840-68l54            0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-qa-1616564760-4wg7l           0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-prod-1616571840-7wmlc           0/6 nodes are available: 6 Insufficient pods.
3s          Normal    Started                           pod/cron-foobar-prod-1616564700-6gk58         Started container cron
3s          Warning   FailedScheduling                  pod/cron-foobar-acc-1616587260-hrcmq            0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-qa-1616595720-x5njq           0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-acc-1616525820-x5vhr            0/6 nodes are available: 6 Insufficient pods.
3s          Warning   FailedScheduling                  pod/cron-foobar-qa-1616558100-x4p96           0/6 nodes are available: 6 Insufficient pods.

有人能帮我正确的方向吗？

Answer 1

But the pod stays in the Pending state.

The pod has an event listed that it can't be scheduled because there are no nodes available.

如果您已达到容量，这符合预期。您可以使用以下命令检查任何节点的容量：

kubectl describe node <node_name>

要获取节点名称，请使用：

kubectl get nodes

要缓解这种情况，请使用更多或更少的节点 pods 或进行配置，以便集群可以在发生这种情况时自动缩放。

Answer 2

要尝试的事情：

删除作业创建的所有 pods。使用 kubectl delete --all pods --namespace=foo 可以删除指定命名空间中的所有 pods。另外，也许删除明显缺少配置的作业。该作业可以配置为在定义的失败或成功次数后停止生成 pods。检查 backOffLimit 和 restartPolicy 在 kubernetes job documentation.
检查污点和容忍度。用 kubectl describe node <node_name> 和里面的 Taints: 部分描述你的节点。如果有一些污点，你将不得不在你的工作容忍度中反映出来。还要检查 "memoryPressure" 或类似的东西。也会在节点描述中列出。
使用 kubectl top nodes 检查可用资源。检查可用 RAM 和 CPU.
检查是否可以拉取容器镜像。也许把它拉上 Docker 并确保它能正常工作并且不会给你超时。
检查所有具有 kubectl get netpol -A 的网络策略，以确保任何策略都会阻止与 kube-system 中的 pods 的通信。
您也可以检查 RBAC 配置，但这有点牵强

Answer 3

你首先可以做的是检查你的集群自动缩放器、RBAC 以及附加到你的节点的角色（你可能在那里缺少一些权限。）对于 crons 检查 restartPolicy .

由于调度失败，Pod 处于 pending 状态

Pod stays in pending state due to failed scheduling

scheduler

kubernetes

kubernetes-pod