从 pods 中解析外部域不起作用
Resolving external domains from within pods does not work
发生了什么事
从 pod 中解析外部域失败并显示 SERVFAIL 消息。在日志中,提到 i/o 超时 错误。
我预期会发生什么
外部域应该从 pods 成功解析。
如何复制它
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
创建上面的 pod(来自 Debugging DNS Resolution 帮助页面)。
运行 kubectl exec dnsutils -it -- nslookup google.com
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com
Server: 10.152.183.10
Address: 10.152.183.10#53
** server can't find google.com.mshome.net: SERVFAIL
command terminated with exit code 1
还有运行kubectl exec dnsutils -it -- nslookup google.com.
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com.
Server: 10.152.183.10
Address: 10.152.183.10#53
** server can't find google.com: SERVFAIL
command terminated with exit code 1
附加信息
我在 Hyper-V 虚拟机.
中使用 microk8s 环境
从虚拟机解析 DNS 成功,Kubernetes 能够拉取容器镜像。仅在 pods 内解析失败,这意味着我无法从 pods.
内与 Internet 通信
这没问题:
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup kubernetes.default
Server: 10.152.183.10
Address: 10.152.183.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.152.183.1
环境
CoreDNS 的版本
image: 'coredns/coredns:1.6.6'
核心文件(取自 ConfigMap)
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
日志
pig@pig202:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns -f
[INFO] 10.1.99.26:47204 - 29832 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002558s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:52008->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:59350 - 50446 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002028s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:60405->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:43050 - 13676 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002151s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:45624->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:36997 - 30359 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002791s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:37554->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:57927 - 53858 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.0001987s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:34079->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:38403 - 36398 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.000224s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:59835->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:57447 - 20295 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001892s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:51534->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:41052 - 56059 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001879s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:47378->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:56748 - 51804 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0003226s
[INFO] 10.1.99.26:45442 - 61916 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001922s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:35528->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:53568->8.8.8.8:53: i/o timeout
OS
pig@pig202:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
在 Ubuntu 18.04.3 LTS 上试过,同样的问题。
其他
mshome.net
搜索域来自 Hyper-V 网络,我假设。也许这会有所帮助:
pig@pig202:~$ nmcli device show eth0
GENERAL.DEVICE: eth0
GENERAL.TYPE: ethernet
GENERAL.HWADDR: 00:15:5D:88:26:02
GENERAL.MTU: 1500
GENERAL.STATE: 100 (connected)
GENERAL.CONNECTION: Wired connection 1
GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1
WIRED-PROPERTIES.CARRIER: on
IP4.ADDRESS[1]: 172.19.120.188/28
IP4.GATEWAY: 172.19.120.177
IP4.ROUTE[1]: dst = 0.0.0.0/0, nh = 172.19.120.177, mt = 100
IP4.ROUTE[2]: dst = 172.19.120.176/28, nh = 0.0.0.0, mt = 100
IP4.ROUTE[3]: dst = 169.254.0.0/16, nh = 0.0.0.0, mt = 1000
IP4.DNS[1]: 172.19.120.177
IP4.DOMAIN[1]: mshome.net
IP6.ADDRESS[1]: fe80::6b4a:57e2:5f1b:f739/64
IP6.GATEWAY: --
IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 100
IP6.ROUTE[2]: dst = ff00::/8, nh = ::, mt = 256, table=255
终于找到了结合两个变化的解决方案。应用这两项更改后,我的 pods 终于可以正确解析地址了。
Kubelet 配置
在known issues的基础上,修改resolv-conf路径供Kubelet使用
# Add resolv-conf flag to Kubelet configuration
echo "--resolv-conf=/run/systemd/resolve/resolv.conf" >> /var/snap/microk8s/current/args/kubelet
# Restart Kubelet
sudo service snap.microk8s.daemon-kubelet restart
CoreDNS 转发
将 CoreDNS 配置映射中的转发地址从默认 (8.8.8.8 8.8.4.4
) 更改为 eth0
设备上的 DNS。
# Dump definition of CoreDNS
microk8s.kubectl get configmap -n kube-system coredns -o yaml > coredns.yaml
coredns.yaml的部分内容:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
获取 DNS:
# Fetch eth0 DNS address (this will print 172.19.120.177 in my case)
nmcli dev show 2>/dev/null | grep DNS | sed 's/^.*:\s*//'
更改以下行并保存:
forward . 8.8.8.8 8.8.4.4 # From this
forward . 172.19.120.177 # To this (your DNS will probably be different)
最后申请更改CoreDNS转发:
microk8s.kubectl apply -f coredns.yaml
发生了什么事
从 pod 中解析外部域失败并显示 SERVFAIL 消息。在日志中,提到 i/o 超时 错误。
我预期会发生什么
外部域应该从 pods 成功解析。
如何复制它
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
创建上面的 pod(来自 Debugging DNS Resolution 帮助页面)。
运行
kubectl exec dnsutils -it -- nslookup google.com
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com Server: 10.152.183.10 Address: 10.152.183.10#53 ** server can't find google.com.mshome.net: SERVFAIL command terminated with exit code 1
还有运行
kubectl exec dnsutils -it -- nslookup google.com.
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup google.com. Server: 10.152.183.10 Address: 10.152.183.10#53 ** server can't find google.com: SERVFAIL command terminated with exit code 1
附加信息
我在 Hyper-V 虚拟机.
中使用 microk8s 环境从虚拟机解析 DNS 成功,Kubernetes 能够拉取容器镜像。仅在 pods 内解析失败,这意味着我无法从 pods.
内与 Internet 通信这没问题:
pig@pig202:~$ kubectl exec dnsutils -it -- nslookup kubernetes.default
Server: 10.152.183.10
Address: 10.152.183.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.152.183.1
环境
CoreDNS 的版本
image: 'coredns/coredns:1.6.6'
核心文件(取自 ConfigMap)
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
日志
pig@pig202:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns -f
[INFO] 10.1.99.26:47204 - 29832 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002558s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:52008->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:59350 - 50446 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002028s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:60405->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:43050 - 13676 "AAAA IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002151s
[ERROR] plugin/errors: 2 grafana.com. AAAA: read udp 10.1.99.19:45624->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:36997 - 30359 "A IN grafana.com. udp 29 false 512" NOERROR - 0 2.0002791s
[ERROR] plugin/errors: 2 grafana.com. A: read udp 10.1.99.19:37554->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:57927 - 53858 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.0001987s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:34079->8.8.4.4:53: i/o timeout
[INFO] 10.1.99.32:38403 - 36398 "A IN google.com.mshome.net. udp 39 false 512" NOERROR - 0 2.000224s
[ERROR] plugin/errors: 2 google.com.mshome.net. A: read udp 10.1.99.19:59835->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:57447 - 20295 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001892s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:51534->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:41052 - 56059 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001879s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:47378->8.8.8.8:53: i/o timeout
[INFO] 10.1.99.26:56748 - 51804 "AAAA IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0003226s
[INFO] 10.1.99.26:45442 - 61916 "A IN grafana.com.mshome.net. udp 40 false 512" NOERROR - 0 2.0001922s
[ERROR] plugin/errors: 2 grafana.com.mshome.net. AAAA: read udp 10.1.99.19:35528->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 grafana.com.mshome.net. A: read udp 10.1.99.19:53568->8.8.8.8:53: i/o timeout
OS
pig@pig202:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
在 Ubuntu 18.04.3 LTS 上试过,同样的问题。
其他
mshome.net
搜索域来自 Hyper-V 网络,我假设。也许这会有所帮助:
pig@pig202:~$ nmcli device show eth0
GENERAL.DEVICE: eth0
GENERAL.TYPE: ethernet
GENERAL.HWADDR: 00:15:5D:88:26:02
GENERAL.MTU: 1500
GENERAL.STATE: 100 (connected)
GENERAL.CONNECTION: Wired connection 1
GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1
WIRED-PROPERTIES.CARRIER: on
IP4.ADDRESS[1]: 172.19.120.188/28
IP4.GATEWAY: 172.19.120.177
IP4.ROUTE[1]: dst = 0.0.0.0/0, nh = 172.19.120.177, mt = 100
IP4.ROUTE[2]: dst = 172.19.120.176/28, nh = 0.0.0.0, mt = 100
IP4.ROUTE[3]: dst = 169.254.0.0/16, nh = 0.0.0.0, mt = 1000
IP4.DNS[1]: 172.19.120.177
IP4.DOMAIN[1]: mshome.net
IP6.ADDRESS[1]: fe80::6b4a:57e2:5f1b:f739/64
IP6.GATEWAY: --
IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 100
IP6.ROUTE[2]: dst = ff00::/8, nh = ::, mt = 256, table=255
终于找到了结合两个变化的解决方案。应用这两项更改后,我的 pods 终于可以正确解析地址了。
Kubelet 配置
在known issues的基础上,修改resolv-conf路径供Kubelet使用
# Add resolv-conf flag to Kubelet configuration
echo "--resolv-conf=/run/systemd/resolve/resolv.conf" >> /var/snap/microk8s/current/args/kubelet
# Restart Kubelet
sudo service snap.microk8s.daemon-kubelet restart
CoreDNS 转发
将 CoreDNS 配置映射中的转发地址从默认 (8.8.8.8 8.8.4.4
) 更改为 eth0
设备上的 DNS。
# Dump definition of CoreDNS
microk8s.kubectl get configmap -n kube-system coredns -o yaml > coredns.yaml
coredns.yaml的部分内容:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
获取 DNS:
# Fetch eth0 DNS address (this will print 172.19.120.177 in my case)
nmcli dev show 2>/dev/null | grep DNS | sed 's/^.*:\s*//'
更改以下行并保存:
forward . 8.8.8.8 8.8.4.4 # From this
forward . 172.19.120.177 # To this (your DNS will probably be different)
最后申请更改CoreDNS转发:
microk8s.kubectl apply -f coredns.yaml