Kubernetes:CoreDNS 和解析主机名的问题

Kubernetes: CoreDNS and problem with resolving hostnames

我有两个 kubernetes pods 运行 通过 Rancher:

#1 - busybox #2 - dnsutils

来自 pod #1:

/ # cat /etc/resolv.conf 
nameserver 10.43.0.10
search testspace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

然后

/ # nslookup kubernetes.default
Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
/ # nslookup kubernetes.default
Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'kubernetes.default'
/ # nslookup kubernetes.default
Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.43.0.1 kubernetes.default.svc.cluster.local

所以有时它有效,但大多数情况下无效。

然后从 pod #2:

nameserver 10.43.0.10
search testspace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

然后:

/ # nslookup kubernetes.default
;; connection timed out; no servers could be reached

/ # nslookup kubernetes.default
;; connection timed out; no servers could be reached

/ # nslookup kubernetes.default
Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.43.0.1
;; connection timed out; no servers could be reached

所以它基本上不起作用。

同样的问题是当我尝试访问任何外部主机名时。

还尝试根据 here

中的文章进行故障排除

配置图:

kubectl -n kube-system edit configmap coredns

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . "/etc/resolv.conf"
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"Corefile":".:53 {\n    errors\n    health {\n      lameduck 5s\n    }\n    ready\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n      pods insecure\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . \"/etc/resolv.conf\"\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"coredns","namespace":"kube-system"}}
  creationTimestamp: "2020-08-07T19:28:25Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:Corefile: {}
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
    manager: kubectl
    operation: Update
    time: "2020-08-24T19:22:17Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "4118524"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 1f3615b0-9349-4bc5-990b-7fed31879fa2
~                                          

有什么想法吗?

出现 kube-dns 服务无法获取 CoreDNS pods

> kubectl get svc -o wide --namespace=kube-system

NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
kube-dns         ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP,9153/TCP   24d   k8s-app=kube-dns

当从一个节点直接调用到 pod 的 CoreDNS 可用时

/ # nslookup google.com 10.42.1.18 
Server:         10.42.1.18
Address:        10.42.1.18#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.10.110
Name:   google.com
Address: 2607:f8b0:4006:802::200e

另一个节点不是:

/ # nslookup google.com 10.42.2.37
;; connection timed out; no servers could be reached

这可能会给 kube-dns 服务带来问题。

在这种情况下,我决定重建那个有问题的节点,然后问题就消失了。