由于调度失败,Pod 处于 pending 状态
Pod stays in pending state due to failed scheduling
我是 Kubernetes 的新手,我正在尝试进行部署 运行。
在我推送部署配置后,创建了副本集,并且将创建 pod。但是 pod 保持 Pending
状态。
pod 列出了一个无法安排的事件,因为没有可用的节点。输出 kubectl describe pod foo-qa-1616599440
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 18m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 11m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 5m18s default-scheduler 0/6 nodes are available: 6 Insufficient pods.
但是有节点可用。 kubectl get nodes
的输出:
NAME STATUS ROLES AGE VERSION
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 54d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 54d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
我注意到的另一件事是,正在创建许多相同的工作,并且所有工作的状态都是 Pending
。我不知道这是否是正常行为,但有超过 200 个,并且还在增加。输出 kubectl get jobs
:
...
cron-foo-qa-1616598720 0/1 17m 17m
cron-foo-qa-1616598780 0/1 16m 16m
cron-foo-qa-1616598840 0/1 15m 15m
cron-foo-qa-1616598900 0/1 14m 14m
cron-foo-qa-1616598960 0/1 13m 13m
cron-foo-qa-1616599020 0/1 12m 12m
cron-foo-qa-1616599080 0/1 11m 11m
cron-foo-qa-1616599200 0/1 9m2s 9m2s
cron-foo-qa-1616599260 0/1 8m4s 8m4s
cron-foo-qa-1616599320 0/1 7m7s 7m7s
cron-foo-qa-1616599380 0/1 6m11s 6m12s
cron-foo-qa-1616599440 0/1 5m1s 5m1s
cron-foo-qa-1616599500 0/1 4m4s 4m4s
cron-foo-qa-1616599560 0/1 3m6s 3m6s
cron-foo-qa-1616599620 0/1 2m10s 2m10s
cron-foo-qa-1616599680 0/1 74s 74s
cron-foo-qa-1616599740 0/1 2s
如果我在检查事件列表时是正确的,我会看到一些安排正在发生。 kubectl get events --sort-by='.metadata.creationTimestamp'
的输出:
...
3s Warning FailedScheduling pod/cron-foobar-prod-1616590260-vwqsk 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616590260-j29vx 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616569560-g8mn2 0/6 nodes are available: 6 Insufficient pods.
3s Normal Scheduled pod/cron-foobar-acc-1616560380-6x88z Successfully assigned middleware/cron-foobar-acc-1616560380-6x88z to ip-xxx-xxx-xxx-xxx.eu-central-1.compute.internal
3s Warning FailedScheduling pod/cron-foobar-prod-1616596560-hx895 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616598180-vwls2 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616536260-vh7bl 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616571840-68l54 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616564760-4wg7l 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616571840-7wmlc 0/6 nodes are available: 6 Insufficient pods.
3s Normal Started pod/cron-foobar-prod-1616564700-6gk58 Started container cron
3s Warning FailedScheduling pod/cron-foobar-acc-1616587260-hrcmq 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616595720-x5njq 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616525820-x5vhr 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616558100-x4p96 0/6 nodes are available: 6 Insufficient pods.
有人能帮我正确的方向吗?
But the pod stays in the Pending state.
The pod has an event listed that it can't be scheduled because there are no nodes available.
如果您已达到容量,这符合预期。您可以使用以下命令检查任何节点的容量:
kubectl describe node <node_name>
要获取节点名称,请使用:
kubectl get nodes
要缓解这种情况,请使用更多或更少的节点 pods 或进行配置,以便集群可以在发生这种情况时自动缩放。
要尝试的事情:
删除作业创建的所有 pods。使用 kubectl delete --all pods --namespace=foo
可以删除指定命名空间中的所有 pods。另外,也许删除明显缺少配置的作业。该作业可以配置为在定义的失败或成功次数后停止生成 pods。检查 backOffLimit
和 restartPolicy
在 kubernetes job documentation.
检查污点和容忍度。用 kubectl describe node <node_name>
和里面的 Taints:
部分描述你的节点。如果有一些污点,你将不得不在你的工作容忍度中反映出来。还要检查 "memoryPressure" 或类似的东西。也会在节点描述中列出。
使用 kubectl top nodes
检查可用资源。检查可用 RAM 和 CPU.
检查是否可以拉取容器镜像。也许把它拉上 Docker 并确保它能正常工作并且不会给你超时。
检查所有具有 kubectl get netpol -A
的网络策略,以确保任何策略都会阻止与 kube-system 中的 pods 的通信。
您也可以检查 RBAC 配置,但这有点牵强
你首先可以做的是检查你的集群自动缩放器、RBAC 以及附加到你的节点的角色(你可能在那里缺少一些权限。)
对于 crons 检查 restartPolicy .
我是 Kubernetes 的新手,我正在尝试进行部署 运行。
在我推送部署配置后,创建了副本集,并且将创建 pod。但是 pod 保持 Pending
状态。
pod 列出了一个无法安排的事件,因为没有可用的节点。输出 kubectl describe pod foo-qa-1616599440
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 18m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 11m default-scheduler 0/6 nodes are available: 6 Insufficient pods.
Warning FailedScheduling 5m18s default-scheduler 0/6 nodes are available: 6 Insufficient pods.
但是有节点可用。 kubectl get nodes
的输出:
NAME STATUS ROLES AGE VERSION
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 54d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 54d v1.17.12-eks-7684af
ip-xxx-xx-xx-xx.eu-central-1.compute.internal Ready <none> 64d v1.17.12-eks-7684af
我注意到的另一件事是,正在创建许多相同的工作,并且所有工作的状态都是 Pending
。我不知道这是否是正常行为,但有超过 200 个,并且还在增加。输出 kubectl get jobs
:
...
cron-foo-qa-1616598720 0/1 17m 17m
cron-foo-qa-1616598780 0/1 16m 16m
cron-foo-qa-1616598840 0/1 15m 15m
cron-foo-qa-1616598900 0/1 14m 14m
cron-foo-qa-1616598960 0/1 13m 13m
cron-foo-qa-1616599020 0/1 12m 12m
cron-foo-qa-1616599080 0/1 11m 11m
cron-foo-qa-1616599200 0/1 9m2s 9m2s
cron-foo-qa-1616599260 0/1 8m4s 8m4s
cron-foo-qa-1616599320 0/1 7m7s 7m7s
cron-foo-qa-1616599380 0/1 6m11s 6m12s
cron-foo-qa-1616599440 0/1 5m1s 5m1s
cron-foo-qa-1616599500 0/1 4m4s 4m4s
cron-foo-qa-1616599560 0/1 3m6s 3m6s
cron-foo-qa-1616599620 0/1 2m10s 2m10s
cron-foo-qa-1616599680 0/1 74s 74s
cron-foo-qa-1616599740 0/1 2s
如果我在检查事件列表时是正确的,我会看到一些安排正在发生。 kubectl get events --sort-by='.metadata.creationTimestamp'
的输出:
...
3s Warning FailedScheduling pod/cron-foobar-prod-1616590260-vwqsk 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616590260-j29vx 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616569560-g8mn2 0/6 nodes are available: 6 Insufficient pods.
3s Normal Scheduled pod/cron-foobar-acc-1616560380-6x88z Successfully assigned middleware/cron-foobar-acc-1616560380-6x88z to ip-xxx-xxx-xxx-xxx.eu-central-1.compute.internal
3s Warning FailedScheduling pod/cron-foobar-prod-1616596560-hx895 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616598180-vwls2 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616536260-vh7bl 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616571840-68l54 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616564760-4wg7l 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-prod-1616571840-7wmlc 0/6 nodes are available: 6 Insufficient pods.
3s Normal Started pod/cron-foobar-prod-1616564700-6gk58 Started container cron
3s Warning FailedScheduling pod/cron-foobar-acc-1616587260-hrcmq 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616595720-x5njq 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-acc-1616525820-x5vhr 0/6 nodes are available: 6 Insufficient pods.
3s Warning FailedScheduling pod/cron-foobar-qa-1616558100-x4p96 0/6 nodes are available: 6 Insufficient pods.
有人能帮我正确的方向吗?
But the pod stays in the Pending state.
The pod has an event listed that it can't be scheduled because there are no nodes available.
如果您已达到容量,这符合预期。您可以使用以下命令检查任何节点的容量:
kubectl describe node <node_name>
要获取节点名称,请使用:
kubectl get nodes
要缓解这种情况,请使用更多或更少的节点 pods 或进行配置,以便集群可以在发生这种情况时自动缩放。
要尝试的事情:
删除作业创建的所有 pods。使用
kubectl delete --all pods --namespace=foo
可以删除指定命名空间中的所有 pods。另外,也许删除明显缺少配置的作业。该作业可以配置为在定义的失败或成功次数后停止生成 pods。检查backOffLimit
和restartPolicy
在 kubernetes job documentation.检查污点和容忍度。用
kubectl describe node <node_name>
和里面的Taints:
部分描述你的节点。如果有一些污点,你将不得不在你的工作容忍度中反映出来。还要检查 "memoryPressure" 或类似的东西。也会在节点描述中列出。使用
kubectl top nodes
检查可用资源。检查可用 RAM 和 CPU.检查是否可以拉取容器镜像。也许把它拉上 Docker 并确保它能正常工作并且不会给你超时。
检查所有具有
kubectl get netpol -A
的网络策略,以确保任何策略都会阻止与 kube-system 中的 pods 的通信。您也可以检查 RBAC 配置,但这有点牵强
你首先可以做的是检查你的集群自动缩放器、RBAC 以及附加到你的节点的角色(你可能在那里缺少一些权限。) 对于 crons 检查 restartPolicy .