kubectl 无法访问新配置的 kubernetes 节点
Newly provisioned kubernetes nodes are inaccessible by kubectl
我在 Kubernetes 1.9 中使用 Kubespray
当我尝试通过 kubectl 与新节点上的 pods 交互时,我看到的是以下内容。重要的是要注意节点被认为是健康的并且在它们上适当地安排了 pods。 pods 完全正常。
➜ Scripts k logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
我能够通过 IP 和 DNS 在本地 运行 kubectl 和所有主节点上 ping 到 kubeworker 节点。
➜ Scripts ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111): 56 data bytes
64 bytes from 10.0.0.111: icmp_seq=0 ttl=63 time=88.972 ms
^C
pubuntu@kubemaster-rwva1-prod-1:~$ ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111) 56(84) bytes of data.
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=1 ttl=64 time=0.259 ms
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=2 ttl=64 time=0.213 ms
➜ Scripts k get nodes
NAME STATUS ROLES AGE VERSION
kubemaster-rwva1-prod-1 Ready master 174d v1.9.2+coreos.0
kubemaster-rwva1-prod-2 Ready master 174d v1.9.2+coreos.0
kubemaster-rwva1-prod-3 Ready master 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-1 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-10 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-11 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-12 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-14 Ready node 16d v1.9.2+coreos.0
kubeworker-rwva1-prod-15 Ready node 14d v1.9.2+coreos.0
kubeworker-rwva1-prod-16 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-17 Ready node 4d v1.9.2+coreos.0
kubeworker-rwva1-prod-18 Ready node 4d v1.9.2+coreos.0
kubeworker-rwva1-prod-19 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-2 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-20 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-21 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-3 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-4 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-5 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-6 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-7 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-8 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-9 Ready node 174d v1.9.2+coreos.0
当我描述一个损坏的节点时,它看起来与我的一个正常运行的节点相同。
➜ Scripts k describe node kubeworker-rwva1-prod-14
Name: kubeworker-rwva1-prod-14
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=kubeworker-rwva1-prod-14
node-role.kubernetes.io/node=true
role=app-tier
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 17 Jul 2018 19:35:08 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:18 -0700 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.111
Hostname: kubeworker-rwva1-prod-14
Capacity:
cpu: 32
memory: 147701524Ki
pods: 110
Allocatable:
cpu: 31900m
memory: 147349124Ki
pods: 110
System Info:
Machine ID: da30025a3f8fd6c3f4de778c5b4cf558
System UUID: 5ACCBB64-2533-E611-97F0-0894EF1D343B
Boot ID: 6b42ba3e-36c4-4520-97e6-e7c6fed195e2
Kernel Version: 4.4.0-130-generic
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.1
Kubelet Version: v1.9.2+coreos.0
Kube-Proxy Version: v1.9.2+coreos.0
ExternalID: kubeworker-rwva1-prod-14
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system calico-node-cd7qg 150m (0%) 300m (0%) 64M (0%) 500M (0%)
kube-system kube-proxy-kubeworker-rwva1-prod-14 150m (0%) 500m (1%) 64M (0%) 2G (1%)
kube-system nginx-proxy-kubeworker-rwva1-prod-14 25m (0%) 300m (0%) 32M (0%) 512M (0%)
prometheus prometheus-prometheus-node-exporter-gckzj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
rabbit-relay rabbit-relay-844d6865c7-q6fr2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
325m (1%) 1100m (3%) 160M (0%) 3012M (1%)
Events: <none>
➜ Scripts k describe node kubeworker-rwva1-prod-11
Name: kubeworker-rwva1-prod-11
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=kubeworker-rwva1-prod-11
node-role.kubernetes.io/node=true
role=test
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Fri, 09 Feb 2018 21:03:46 -0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 03 Aug 2018 18:46:31 -0700 Fri, 09 Feb 2018 21:03:38 -0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.218
Hostname: kubeworker-rwva1-prod-11
Capacity:
cpu: 32
memory: 131985484Ki
pods: 110
Allocatable:
cpu: 31900m
memory: 131633084Ki
pods: 110
System Info:
Machine ID: 0ff6f3b9214b38ad07c063d45a6a5175
System UUID: 4C4C4544-0044-5710-8037-B3C04F525631
Boot ID: 4d7ec0fc-428f-4b4c-aaae-8e70f374fbb1
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.1
Kubelet Version: v1.9.2+coreos.0
Kube-Proxy Version: v1.9.2+coreos.0
ExternalID: kubeworker-rwva1-prod-11
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
ingress-nginx-internal default-http-backend-internal-7c8ff87c86-955np 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%)
kube-system calico-node-8fzk6 150m (0%) 300m (0%) 64M (0%) 500M (0%)
kube-system kube-proxy-kubeworker-rwva1-prod-11 150m (0%) 500m (1%) 64M (0%) 2G (1%)
kube-system nginx-proxy-kubeworker-rwva1-prod-11 25m (0%) 300m (0%) 32M (0%) 512M (0%)
prometheus prometheus-prometheus-kube-state-metrics-7c5cbb6f55-jq97n 0 (0%) 0 (0%) 0 (0%) 0 (0%)
prometheus prometheus-prometheus-node-exporter-7gn2x 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
335m (1%) 1110m (3%) 176730Ki (0%) 3032971520 (2%)
Events: <none>
怎么回事?
➜ k logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
➜ cat /etc/hosts | head -n1
10.0.0.111 kubeworker-rwva1-prod-14
ubuntu@kubemaster-rwva1-prod-1:~$ ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111) 56(84) bytes of data.
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=1 ttl=64 time=0.275 ms
^C
--- kubeworker-rwva1-prod-14 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.275/0.275/0.275/0.000 ms
ubuntu@kubemaster-rwva1-prod-1:~$ kubectl logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
What's going on?
该名称需要可从您的工作站解析,因为对于 kubectl logs
和 kubectl exec
,API 发送 URL 供客户端交互 直接与目标节点上的kubelet
(以确保世界上的所有流量都不通过API服务器)。
值得庆幸的是,kubespray 有一个旋钮,通过它您可以告诉 kubernetes 更喜欢节点的 ExternalIP
(当然,如果您愿意,也可以是 InternalIP
):https://github.com/kubernetes-incubator/kubespray/blob/v2.5.0/roles/kubernetes/master/defaults/main.yml#L82
疯狂的问题。我不知道我是如何解决这个问题的。但是我以某种方式通过删除我的一个非功能节点并使用完整的 FQDN 重新注册它来将它重新组合在一起。这以某种方式修复了一切。然后我能够删除 FQDN 注册节点并重新创建它的短名称。
在大量 TCPdumping 之后,我能想到的最好的解释是错误消息是准确的,但是以一种非常愚蠢和令人困惑的方式。
{"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-prometheus-node-exporter-gckzj","generateName":"prometheus-prometheus-node-exporter-","namespace":"prometheus","selfLink":"/api/v1/namespaces/prometheus/pods/prometheus-prometheus-node-exporter-gckzj","uid":"2fa4b744-8a33-11e8-9b15-bc305bef2e18","resourceVersion":"37138627","creationTimestamp":"2018-07-18T02:35:08Z","labels":{"app":"prometheus","component":"node-exporter","controller-revision-hash":"1725903292","pod-template-generation":"1","release":"prometheus"},"ownerReferences":[{"apiVersion":"extensions/v1beta1","kind":"DaemonSet","name":"prometheus-prometheus-node-exporter","uid":"e9216885-1616-11e8-b853-d4ae528b79ed","controller":true,"blockOwnerDeletion":true}]},"spec":{"volumes":[{"name":"proc","hostPath":{"path":"/proc","type":""}},{"name":"sys","hostPath":{"path":"/sys","type":""}},{"name":"prometheus-prometheus-node-exporter-token-zvrdk","secret":{"secretName":"prometheus-prometheus-node-exporter-token-zvrdk","defaultMode":420}}],"containers":[{"name":"prometheus-node-exporter","image":"prom/node-exporter:v0.15.2","args":["--path.procfs=/host/proc","--path.sysfs=/host/sys"],"ports":[{"name":"metrics","hostPort":9100,"containerPort":9100,"protocol":"TCP"}],"resources":{},"volumeMounts":[{"name":"proc","readOnly":true,"mountPath":"/host/proc"},{"name":"sys","readOnly":true,"mountPath":"/host/sys"},{"name":"prometheus-prometheus-node-exporter-token-zvrdk","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-prometheus-node-exporter","serviceAccount":"prometheus-prometheus-node-exporter","nodeName":"kubeworker-rwva1-prod-14","hostNetwork":true,"hostPID":true,"securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute"},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute"},{"key":"node.kubernetes.io/disk-pressure","operator":"Exists","effect":"NoSchedule"},{"key":"node.kubernetes.io/memory-pressure","operator":"Exists","effect":"NoSchedule"}]},"status":{"phase":"Running","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-18T02:35:13Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-20T08:02:58Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-18T02:35:14Z"}],"hostIP":"10.0.0.111","podIP":"10.0.0.111","startTime":"2018-07-18T02:35:13Z","containerStatuses":[{"name":"prometheus-node-exporter","state":{"running":{"startedAt":"2018-07-20T08:02:58Z"}},"lastState":{"terminated":{"exitCode":143,"reason":"Error","startedAt":"2018-07-20T08:02:27Z","finishedAt":"2018-07-20T08:02:39Z","containerID":"docker://db44927ad64eb130a73bee3c7b250f55ad911584415c373d3e3fa0fc838c146e"}},"ready":true,"restartCount":2,"image":"prom/node-exporter:v0.15.2","imageID":"docker-pullable://prom/node-exporter@sha256:6965ed8f31c5edba19d269d10238f59624e6b004f650ce925b3408ce222f9e49","containerID":"docker://4743ad5c5e60c31077e57d51eb522270c96ed227bab6522b4fcde826c4abc064"}],"qosClass":"BestEffort"}}
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host","code":500}
集群的内部 DNS 无法正确读取 API 以生成必要的记录。如果没有 DNS 授权的名称,集群会使用我的上游 DNS 记录来尝试递归解析该名称。上游 DNS 服务器不知道如何处理没有 tld 后缀的短格式名称。
我在 Kubernetes 1.9 中使用 Kubespray
当我尝试通过 kubectl 与新节点上的 pods 交互时,我看到的是以下内容。重要的是要注意节点被认为是健康的并且在它们上适当地安排了 pods。 pods 完全正常。
➜ Scripts k logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
我能够通过 IP 和 DNS 在本地 运行 kubectl 和所有主节点上 ping 到 kubeworker 节点。
➜ Scripts ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111): 56 data bytes
64 bytes from 10.0.0.111: icmp_seq=0 ttl=63 time=88.972 ms
^C
pubuntu@kubemaster-rwva1-prod-1:~$ ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111) 56(84) bytes of data.
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=1 ttl=64 time=0.259 ms
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=2 ttl=64 time=0.213 ms
➜ Scripts k get nodes
NAME STATUS ROLES AGE VERSION
kubemaster-rwva1-prod-1 Ready master 174d v1.9.2+coreos.0
kubemaster-rwva1-prod-2 Ready master 174d v1.9.2+coreos.0
kubemaster-rwva1-prod-3 Ready master 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-1 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-10 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-11 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-12 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-14 Ready node 16d v1.9.2+coreos.0
kubeworker-rwva1-prod-15 Ready node 14d v1.9.2+coreos.0
kubeworker-rwva1-prod-16 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-17 Ready node 4d v1.9.2+coreos.0
kubeworker-rwva1-prod-18 Ready node 4d v1.9.2+coreos.0
kubeworker-rwva1-prod-19 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-2 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-20 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-21 Ready node 6d v1.9.2+coreos.0
kubeworker-rwva1-prod-3 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-4 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-5 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-6 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-7 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-8 Ready node 174d v1.9.2+coreos.0
kubeworker-rwva1-prod-9 Ready node 174d v1.9.2+coreos.0
当我描述一个损坏的节点时,它看起来与我的一个正常运行的节点相同。
➜ Scripts k describe node kubeworker-rwva1-prod-14
Name: kubeworker-rwva1-prod-14
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=kubeworker-rwva1-prod-14
node-role.kubernetes.io/node=true
role=app-tier
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 17 Jul 2018 19:35:08 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:08 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 03 Aug 2018 18:44:59 -0700 Tue, 17 Jul 2018 19:35:18 -0700 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.111
Hostname: kubeworker-rwva1-prod-14
Capacity:
cpu: 32
memory: 147701524Ki
pods: 110
Allocatable:
cpu: 31900m
memory: 147349124Ki
pods: 110
System Info:
Machine ID: da30025a3f8fd6c3f4de778c5b4cf558
System UUID: 5ACCBB64-2533-E611-97F0-0894EF1D343B
Boot ID: 6b42ba3e-36c4-4520-97e6-e7c6fed195e2
Kernel Version: 4.4.0-130-generic
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.1
Kubelet Version: v1.9.2+coreos.0
Kube-Proxy Version: v1.9.2+coreos.0
ExternalID: kubeworker-rwva1-prod-14
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system calico-node-cd7qg 150m (0%) 300m (0%) 64M (0%) 500M (0%)
kube-system kube-proxy-kubeworker-rwva1-prod-14 150m (0%) 500m (1%) 64M (0%) 2G (1%)
kube-system nginx-proxy-kubeworker-rwva1-prod-14 25m (0%) 300m (0%) 32M (0%) 512M (0%)
prometheus prometheus-prometheus-node-exporter-gckzj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
rabbit-relay rabbit-relay-844d6865c7-q6fr2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
325m (1%) 1100m (3%) 160M (0%) 3012M (1%)
Events: <none>
➜ Scripts k describe node kubeworker-rwva1-prod-11
Name: kubeworker-rwva1-prod-11
Roles: node
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=kubeworker-rwva1-prod-11
node-role.kubernetes.io/node=true
role=test
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Fri, 09 Feb 2018 21:03:46 -0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 03 Aug 2018 18:46:31 -0700 Fri, 09 Feb 2018 21:03:38 -0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 03 Aug 2018 18:46:31 -0700 Mon, 16 Jul 2018 13:24:58 -0700 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.218
Hostname: kubeworker-rwva1-prod-11
Capacity:
cpu: 32
memory: 131985484Ki
pods: 110
Allocatable:
cpu: 31900m
memory: 131633084Ki
pods: 110
System Info:
Machine ID: 0ff6f3b9214b38ad07c063d45a6a5175
System UUID: 4C4C4544-0044-5710-8037-B3C04F525631
Boot ID: 4d7ec0fc-428f-4b4c-aaae-8e70f374fbb1
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.1
Kubelet Version: v1.9.2+coreos.0
Kube-Proxy Version: v1.9.2+coreos.0
ExternalID: kubeworker-rwva1-prod-11
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
ingress-nginx-internal default-http-backend-internal-7c8ff87c86-955np 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%)
kube-system calico-node-8fzk6 150m (0%) 300m (0%) 64M (0%) 500M (0%)
kube-system kube-proxy-kubeworker-rwva1-prod-11 150m (0%) 500m (1%) 64M (0%) 2G (1%)
kube-system nginx-proxy-kubeworker-rwva1-prod-11 25m (0%) 300m (0%) 32M (0%) 512M (0%)
prometheus prometheus-prometheus-kube-state-metrics-7c5cbb6f55-jq97n 0 (0%) 0 (0%) 0 (0%) 0 (0%)
prometheus prometheus-prometheus-node-exporter-7gn2x 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
335m (1%) 1110m (3%) 176730Ki (0%) 3032971520 (2%)
Events: <none>
怎么回事?
➜ k logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
➜ cat /etc/hosts | head -n1
10.0.0.111 kubeworker-rwva1-prod-14
ubuntu@kubemaster-rwva1-prod-1:~$ ping kubeworker-rwva1-prod-14
PING kubeworker-rwva1-prod-14 (10.0.0.111) 56(84) bytes of data.
64 bytes from kubeworker-rwva1-prod-14 (10.0.0.111): icmp_seq=1 ttl=64 time=0.275 ms
^C
--- kubeworker-rwva1-prod-14 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.275/0.275/0.275/0.000 ms
ubuntu@kubemaster-rwva1-prod-1:~$ kubectl logs -f -n prometheus prometheus-prometheus-node-exporter-gckzj
Error from server: Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host
What's going on?
该名称需要可从您的工作站解析,因为对于 kubectl logs
和 kubectl exec
,API 发送 URL 供客户端交互 直接与目标节点上的kubelet
(以确保世界上的所有流量都不通过API服务器)。
值得庆幸的是,kubespray 有一个旋钮,通过它您可以告诉 kubernetes 更喜欢节点的 ExternalIP
(当然,如果您愿意,也可以是 InternalIP
):https://github.com/kubernetes-incubator/kubespray/blob/v2.5.0/roles/kubernetes/master/defaults/main.yml#L82
疯狂的问题。我不知道我是如何解决这个问题的。但是我以某种方式通过删除我的一个非功能节点并使用完整的 FQDN 重新注册它来将它重新组合在一起。这以某种方式修复了一切。然后我能够删除 FQDN 注册节点并重新创建它的短名称。
在大量 TCPdumping 之后,我能想到的最好的解释是错误消息是准确的,但是以一种非常愚蠢和令人困惑的方式。
{"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-prometheus-node-exporter-gckzj","generateName":"prometheus-prometheus-node-exporter-","namespace":"prometheus","selfLink":"/api/v1/namespaces/prometheus/pods/prometheus-prometheus-node-exporter-gckzj","uid":"2fa4b744-8a33-11e8-9b15-bc305bef2e18","resourceVersion":"37138627","creationTimestamp":"2018-07-18T02:35:08Z","labels":{"app":"prometheus","component":"node-exporter","controller-revision-hash":"1725903292","pod-template-generation":"1","release":"prometheus"},"ownerReferences":[{"apiVersion":"extensions/v1beta1","kind":"DaemonSet","name":"prometheus-prometheus-node-exporter","uid":"e9216885-1616-11e8-b853-d4ae528b79ed","controller":true,"blockOwnerDeletion":true}]},"spec":{"volumes":[{"name":"proc","hostPath":{"path":"/proc","type":""}},{"name":"sys","hostPath":{"path":"/sys","type":""}},{"name":"prometheus-prometheus-node-exporter-token-zvrdk","secret":{"secretName":"prometheus-prometheus-node-exporter-token-zvrdk","defaultMode":420}}],"containers":[{"name":"prometheus-node-exporter","image":"prom/node-exporter:v0.15.2","args":["--path.procfs=/host/proc","--path.sysfs=/host/sys"],"ports":[{"name":"metrics","hostPort":9100,"containerPort":9100,"protocol":"TCP"}],"resources":{},"volumeMounts":[{"name":"proc","readOnly":true,"mountPath":"/host/proc"},{"name":"sys","readOnly":true,"mountPath":"/host/sys"},{"name":"prometheus-prometheus-node-exporter-token-zvrdk","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-prometheus-node-exporter","serviceAccount":"prometheus-prometheus-node-exporter","nodeName":"kubeworker-rwva1-prod-14","hostNetwork":true,"hostPID":true,"securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute"},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute"},{"key":"node.kubernetes.io/disk-pressure","operator":"Exists","effect":"NoSchedule"},{"key":"node.kubernetes.io/memory-pressure","operator":"Exists","effect":"NoSchedule"}]},"status":{"phase":"Running","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-18T02:35:13Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-20T08:02:58Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-07-18T02:35:14Z"}],"hostIP":"10.0.0.111","podIP":"10.0.0.111","startTime":"2018-07-18T02:35:13Z","containerStatuses":[{"name":"prometheus-node-exporter","state":{"running":{"startedAt":"2018-07-20T08:02:58Z"}},"lastState":{"terminated":{"exitCode":143,"reason":"Error","startedAt":"2018-07-20T08:02:27Z","finishedAt":"2018-07-20T08:02:39Z","containerID":"docker://db44927ad64eb130a73bee3c7b250f55ad911584415c373d3e3fa0fc838c146e"}},"ready":true,"restartCount":2,"image":"prom/node-exporter:v0.15.2","imageID":"docker-pullable://prom/node-exporter@sha256:6965ed8f31c5edba19d269d10238f59624e6b004f650ce925b3408ce222f9e49","containerID":"docker://4743ad5c5e60c31077e57d51eb522270c96ed227bab6522b4fcde826c4abc064"}],"qosClass":"BestEffort"}}
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get https://kubeworker-rwva1-prod-14:10250/containerLogs/prometheus/prometheus-prometheus-node-exporter-gckzj/prometheus-node-exporter?follow=true: dial tcp: lookup kubeworker-rwva1-prod-14 on 10.0.0.3:53: no such host","code":500}
集群的内部 DNS 无法正确读取 API 以生成必要的记录。如果没有 DNS 授权的名称,集群会使用我的上游 DNS 记录来尝试递归解析该名称。上游 DNS 服务器不知道如何处理没有 tld 后缀的短格式名称。