CoreDNS 将所有 DNS 查询转发到本地路由器,包括集群内服务名称的查询
CoreDNS is forwarding ALL DNS queries to local router, including those for in-cluster service names
目前正在处理一个与 CoreDNS 相关的问题,该问题发生在 Raspberry PI 上的全新 Kubernetes 设置上。
问题:CoreDNS 将所有 DNS 查询转发到本地 gateway/router,它不知道如何解析任何集群内服务名称,无论其具体性如何。
我是如何诊断问题的:
执行任何 nslookup
查询都会导致 NXDOMAIN
响应,这意味着域不存在。此响应始终来自本地路由器。
注意:在下面的输出中10.32.0.2
是其中一个CoreDNS的IP pods,soc.local
是域名集群,wpad.fritz.box
是本地路由器的主机名。
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN
/ # nslookup kubernetes.default 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
/ # nslookup kubernetes.default.soc 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc: NXDOMAIN
** server can't find kubernetes.default.soc: NXDOMAIN
/ # nslookup kubernetes.default.soc.local 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc.local: NXDOMAIN
** server can't find kubernetes.default.soc.local: NXDOMAIN
以下是 tcpdump 的输出以及与 kubernetes
的 nslookup
查询相关的网络流量:
/ # tcpdump -i weave host 10.32.0.2 and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
16:57:48.047794 IP 10.32.0.5.54782 > 10.32.0.2.53: 42507+ A? kubernetes. (28)
16:57:48.048136 IP 10.32.0.5.54782 > 10.32.0.2.53: 43025+ AAAA? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.35867 > wpad.fritz.box.53: 42507+ A? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.37755 > wpad.fritz.box.53: 43025+ AAAA? kubernetes. (28)
16:57:48.050611 IP wpad.fritz.box.53 > 10.32.0.2.35867: 42507 NXDomain 0/1/0 (103)
16:57:48.050916 IP wpad.fritz.box.53 > 10.32.0.2.37755: 43025 NXDomain 0/1/0 (103)
16:57:48.051109 IP 10.32.0.2.53 > 10.32.0.5.54782: 42507 NXDomain 0/1/0 (103)
16:57:48.051503 IP 10.32.0.2.53 > 10.32.0.5.54782: 43025 NXDomain 0/1/0 (103)
以下是 nslookup
查询对应的 CoreDNS 日志:
[INFO] 10.32.0.5:53591 - 23327 "AAAA IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000318349s
[INFO] 10.32.0.5:53591 - 22735 "A IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000447718s
[INFO] 10.32.0.5:58545 - 49038 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.0314311s
[INFO] 10.32.0.5:58545 - 48445 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.033794968s
[INFO] 10.32.0.5:53665 - 62210 "A IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.047918913s
[INFO] 10.32.0.5:53665 - 62802 "AAAA IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.067865341s
[INFO] 10.32.0.5:56021 - 47416 "A IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000430478s
[INFO] 10.32.0.5:56021 - 48046 "AAAA IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000551032s
以下是CoreDNS Corefile的configmap:
$ k get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes soc.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-07-07T20:58:06Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data: {}
manager: kubeadm
operation: Update
time: "2020-07-07T20:58:06Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
f:Corefile: {}
manager: kubectl
operation: Update
time: "2020-07-28T17:21:46Z"
name: coredns
namespace: kube-system
resourceVersion: "2464367"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c6a603c3-30b6-4156-b62e-a98d53761541
我的问题是:为什么 CoreDNS 不处理集群内服务名称的这些 DNS 查询?
不确定还要调试什么。遗憾的是 CoreDNS 图像没有 shell,因此我可以查看 /etc/resolv.conf
文件。
有什么建议吗?
发布问题后不久,我重新阅读了关于 debugging DNS name resolution 的 Kubernetes 文档,在已知问题部分的最后一段中提到了一些存在 DNS 问题的 Alpine 版本。虽然链接的 github 票证没有以同样的方式明确描述我的问题,但似乎 Alpine 版本确实是问题所在:
$ kubectl run -ti --rm alpine --image=alpine:3.9.6 --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address 1: 10.32.0.2 10-32-0-2.kube-dns.kube-system.svc.soc.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.soc.local
/ # pod "alpine" deleted
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN
目前正在处理一个与 CoreDNS 相关的问题,该问题发生在 Raspberry PI 上的全新 Kubernetes 设置上。
问题:CoreDNS 将所有 DNS 查询转发到本地 gateway/router,它不知道如何解析任何集群内服务名称,无论其具体性如何。
我是如何诊断问题的:
执行任何 nslookup
查询都会导致 NXDOMAIN
响应,这意味着域不存在。此响应始终来自本地路由器。
注意:在下面的输出中10.32.0.2
是其中一个CoreDNS的IP pods,soc.local
是域名集群,wpad.fritz.box
是本地路由器的主机名。
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN
/ # nslookup kubernetes.default 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
/ # nslookup kubernetes.default.soc 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc: NXDOMAIN
** server can't find kubernetes.default.soc: NXDOMAIN
/ # nslookup kubernetes.default.soc.local 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes.default.soc.local: NXDOMAIN
** server can't find kubernetes.default.soc.local: NXDOMAIN
以下是 tcpdump 的输出以及与 kubernetes
的 nslookup
查询相关的网络流量:
/ # tcpdump -i weave host 10.32.0.2 and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
16:57:48.047794 IP 10.32.0.5.54782 > 10.32.0.2.53: 42507+ A? kubernetes. (28)
16:57:48.048136 IP 10.32.0.5.54782 > 10.32.0.2.53: 43025+ AAAA? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.35867 > wpad.fritz.box.53: 42507+ A? kubernetes. (28)
16:57:48.048576 IP 10.32.0.2.37755 > wpad.fritz.box.53: 43025+ AAAA? kubernetes. (28)
16:57:48.050611 IP wpad.fritz.box.53 > 10.32.0.2.35867: 42507 NXDomain 0/1/0 (103)
16:57:48.050916 IP wpad.fritz.box.53 > 10.32.0.2.37755: 43025 NXDomain 0/1/0 (103)
16:57:48.051109 IP 10.32.0.2.53 > 10.32.0.5.54782: 42507 NXDomain 0/1/0 (103)
16:57:48.051503 IP 10.32.0.2.53 > 10.32.0.5.54782: 43025 NXDomain 0/1/0 (103)
以下是 nslookup
查询对应的 CoreDNS 日志:
[INFO] 10.32.0.5:53591 - 23327 "AAAA IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000318349s
[INFO] 10.32.0.5:53591 - 22735 "A IN kubernetes. udp 28 false 512" NXDOMAIN qr,aa,rd,ra 103 0.000447718s
[INFO] 10.32.0.5:58545 - 49038 "AAAA IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.0314311s
[INFO] 10.32.0.5:58545 - 48445 "A IN kubernetes.default. udp 36 false 512" NXDOMAIN qr,rd,ra 111 0.033794968s
[INFO] 10.32.0.5:53665 - 62210 "A IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.047918913s
[INFO] 10.32.0.5:53665 - 62802 "AAAA IN kubernetes.default.soc. udp 40 false 512" NXDOMAIN qr,rd,ra 115 0.067865341s
[INFO] 10.32.0.5:56021 - 47416 "A IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000430478s
[INFO] 10.32.0.5:56021 - 48046 "AAAA IN kubernetes.default.soc.local. udp 46 false 512" NXDOMAIN qr,aa,rd 127 0.000551032s
以下是CoreDNS Corefile的configmap:
$ k get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes soc.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-07-07T20:58:06Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data: {}
manager: kubeadm
operation: Update
time: "2020-07-07T20:58:06Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
f:Corefile: {}
manager: kubectl
operation: Update
time: "2020-07-28T17:21:46Z"
name: coredns
namespace: kube-system
resourceVersion: "2464367"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c6a603c3-30b6-4156-b62e-a98d53761541
我的问题是:为什么 CoreDNS 不处理集群内服务名称的这些 DNS 查询?
不确定还要调试什么。遗憾的是 CoreDNS 图像没有 shell,因此我可以查看 /etc/resolv.conf
文件。
有什么建议吗?
发布问题后不久,我重新阅读了关于 debugging DNS name resolution 的 Kubernetes 文档,在已知问题部分的最后一段中提到了一些存在 DNS 问题的 Alpine 版本。虽然链接的 github 票证没有以同样的方式明确描述我的问题,但似乎 Alpine 版本确实是问题所在:
$ kubectl run -ti --rm alpine --image=alpine:3.9.6 --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address 1: 10.32.0.2 10-32-0-2.kube-dns.kube-system.svc.soc.local
Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.soc.local
/ # pod "alpine" deleted
$ kubectl run -ti --rm alpine --image=alpine --restart=Never -- ash
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes 10.32.0.2
Server: 10.32.0.2
Address: 10.32.0.2:53
** server can't find kubernetes: NXDOMAIN
** server can't find kubernetes: NXDOMAIN