推出后的 kubernetes UnexpectedAdmissionError
kubernetes UnexpectedAdmissionError after rollout
我有一个服务无法回复某些 HTTP 请求,挖掘它的日志似乎是某种 DNS 故障到达 proxy
服务
'proxy' failed to resolve 'proxy.default.svc.cluster.local' after 2 queries
所以我找不到任何错误并尝试了 kubectl rollout restart deployment/backend
。
紧接着这些出现在 pods 列表中:
backend-54769cbb4-xkwf2 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xlpgf 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xmnr5 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xmq5n 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xphrw 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xrmrq 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-xrmw8 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xt4ck 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xws8r 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xx6r4 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xxpfd 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xzjql 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xzzlk 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-z46ms 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z4sl7 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z6jpj 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z6ngq 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z8w4h 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z9jqb 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zbvqm 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zcfxg 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zcvqm 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zf2f8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zgnkh 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zhdr8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zhx6g 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zj8f2 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjbwp 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zjc8g 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjdcp 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-zkcrb 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zlpll 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zm2cx 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zn7mr 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-znjkp 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zpnk7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zrrl7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zsdsz 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-ztdx8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztln6 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztplg 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-ztzfh 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zvb8g 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zwsr8 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zwvxr 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zwx6h 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zz4bf 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-zzq6t 0/1 UnexpectedAdmissionError 0 4h2m
(还有更多)
所以我又添加了两个节点,现在一切似乎都很好,除了这个 pods 的大列表处于我不明白的错误状态。这是什么 UnexpectedAdmissionError
,我该怎么办?
注意:这是一个 DigitalOcean 集群
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
以下似乎很重要:kubectl describe one_failed_pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m51s default-scheduler Successfully assigned default/backend-549f576d5f-xzdv4 to std-16gb-g7mo
Warning UnexpectedAdmissionError 2m51s kubelet, std-16gb-g7mo Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.543592130: no space left on device, which is unexpected.
我遇到了同样的问题,在用 UnexpectedAdmissionError 描述其中一个 pods 时,我看到了以下内容:
由于无法写入 deviceplugin 检查点文件“kubelet_internal_checkpoint”,更新插件资源失败:写入 /var/lib/kubelet/device-plugins/.525608957:设备上没有留下 space,这是意外的。
描述节点时:
OutOfDisk Unknown 2020 年 6 月 30 日星期二 14:07:04 -0400 2020 年 6 月 30 日星期二 14:12:05 -0400 NodeStatusUnknown Kubelet 停止发布节点状态。
我通过重启节点解决了这个问题
因为 pod 甚至还没有启动,您实际上无法检查日志。然而,描述 pod 为我提供了错误。我们在 worker5 节点上遇到了一些 disk/cpu/memory 利用率问题。
kubectl get pods | grep -i err
kube-system coredns-autoscaler-79599b9dc6-6l8s8 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-kzt9z 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-tgkrc 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kubectl describe pod -n kube-system coredns-autoscaler-79599b9dc6-kzt9z
Reason: UnexpectedAdmissionError
Message: Pod Allocate failed due to failed to write checkpoint file "kubelet_internal_checkpoint": mkdir /var: file exists, which is unexpected
第一步是重新启动节点,这解决了问题。原因是我们已经将一些备份还原到新集群,还原过程导致了这个问题。
对于 pods 因为它们是副本集的一部分,所以它们在其他工作节点上生成。因此我们删除了 pods.
快速删除大量pods的方法,可以使用:
kubectl get pods -n namespace | grep -i Error | cut -d' ' -f 1 | xargs kubectl delete pod
删除整个集群中的所有错误pods
kubectl get pods -A | grep -i Error | awk '{print }' | xargs kubectl delete pod
您可以使用标志 -A/--all-namespaces 从集群中的所有名称空间获取 pods。
但是,如果它们没有自动生成,这很奇怪,您可以 运行 kubectl replace
kubectl get pod coredns-autoscaler-79599b9dc6-6l8s8 -n kube-system -o yaml | kubectl replace --force -f -
如需进一步详细阅读,请参阅 kubectl replace --help 和以下 blog
我有一个服务无法回复某些 HTTP 请求,挖掘它的日志似乎是某种 DNS 故障到达 proxy
服务
'proxy' failed to resolve 'proxy.default.svc.cluster.local' after 2 queries
所以我找不到任何错误并尝试了 kubectl rollout restart deployment/backend
。
紧接着这些出现在 pods 列表中:
backend-54769cbb4-xkwf2 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xlpgf 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xmnr5 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xmq5n 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-xphrw 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xrmrq 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-xrmw8 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xt4ck 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xws8r 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xx6r4 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-xxpfd 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-xzjql 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-xzzlk 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-z46ms 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z4sl7 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z6jpj 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z6ngq 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-z8w4h 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-z9jqb 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zbvqm 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zcfxg 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zcvqm 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zf2f8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zgnkh 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zhdr8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zhx6g 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zj8f2 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjbwp 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zjc8g 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zjdcp 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-zkcrb 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zlpll 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zm2cx 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zn7mr 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-znjkp 0/1 UnexpectedAdmissionError 0 4h3m
backend-54769cbb4-zpnk7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zrrl7 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zsdsz 0/1 UnexpectedAdmissionError 0 4h4m
backend-54769cbb4-ztdx8 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztln6 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-ztplg 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-ztzfh 0/1 UnexpectedAdmissionError 0 4h2m
backend-54769cbb4-zvb8g 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zwsr8 0/1 UnexpectedAdmissionError 0 4h7m
backend-54769cbb4-zwvxr 0/1 UnexpectedAdmissionError 0 4h5m
backend-54769cbb4-zwx6h 0/1 UnexpectedAdmissionError 0 4h6m
backend-54769cbb4-zz4bf 0/1 UnexpectedAdmissionError 0 4h1m
backend-54769cbb4-zzq6t 0/1 UnexpectedAdmissionError 0 4h2m
(还有更多)
所以我又添加了两个节点,现在一切似乎都很好,除了这个 pods 的大列表处于我不明白的错误状态。这是什么 UnexpectedAdmissionError
,我该怎么办?
注意:这是一个 DigitalOcean 集群
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:36Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
以下似乎很重要:kubectl describe one_failed_pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m51s default-scheduler Successfully assigned default/backend-549f576d5f-xzdv4 to std-16gb-g7mo
Warning UnexpectedAdmissionError 2m51s kubelet, std-16gb-g7mo Update plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": write /var/lib/kubelet/device-plugins/.543592130: no space left on device, which is unexpected.
我遇到了同样的问题,在用 UnexpectedAdmissionError 描述其中一个 pods 时,我看到了以下内容:
由于无法写入 deviceplugin 检查点文件“kubelet_internal_checkpoint”,更新插件资源失败:写入 /var/lib/kubelet/device-plugins/.525608957:设备上没有留下 space,这是意外的。
描述节点时:
OutOfDisk Unknown 2020 年 6 月 30 日星期二 14:07:04 -0400 2020 年 6 月 30 日星期二 14:12:05 -0400 NodeStatusUnknown Kubelet 停止发布节点状态。
我通过重启节点解决了这个问题
因为 pod 甚至还没有启动,您实际上无法检查日志。然而,描述 pod 为我提供了错误。我们在 worker5 节点上遇到了一些 disk/cpu/memory 利用率问题。
kubectl get pods | grep -i err
kube-system coredns-autoscaler-79599b9dc6-6l8s8 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-kzt9z 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kube-system coredns-autoscaler-79599b9dc6-tgkrc 0/1 UnexpectedAdmissionError 0 10h <none> worker5 <none> <none>
kubectl describe pod -n kube-system coredns-autoscaler-79599b9dc6-kzt9z
Reason: UnexpectedAdmissionError
Message: Pod Allocate failed due to failed to write checkpoint file "kubelet_internal_checkpoint": mkdir /var: file exists, which is unexpected
第一步是重新启动节点,这解决了问题。原因是我们已经将一些备份还原到新集群,还原过程导致了这个问题。
对于 pods 因为它们是副本集的一部分,所以它们在其他工作节点上生成。因此我们删除了 pods.
快速删除大量pods的方法,可以使用:
kubectl get pods -n namespace | grep -i Error | cut -d' ' -f 1 | xargs kubectl delete pod
删除整个集群中的所有错误pods
kubectl get pods -A | grep -i Error | awk '{print }' | xargs kubectl delete pod
您可以使用标志 -A/--all-namespaces 从集群中的所有名称空间获取 pods。
但是,如果它们没有自动生成,这很奇怪,您可以 运行 kubectl replace
kubectl get pod coredns-autoscaler-79599b9dc6-6l8s8 -n kube-system -o yaml | kubectl replace --force -f -
如需进一步详细阅读,请参阅 kubectl replace --help 和以下 blog