从基于 Alpine 的 k8s pod 访问服务抛出 DNS 解析错误
Accessing service from an Alpine-based k8s pod is throwing a DNS Resolution error
我有 pod A(它实际上是 kube-scheduler pod)和 pod B(一个具有将由 pod A 调用的 REST API 的 pod)。
为此,我创建了一个 ClusterIP 服务。
现在,当我执行到 pod A 以执行对 pod B 的 API 调用时,我得到:
curl: (6) Could not resolve host: my-svc.default.svc.cluster.local
我尝试按照提到的调试说明进行操作 here:
kubectl exec -i -t dnsutils -- nslookup my-svc.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: my-svc.default.svc.cluster.local
Address: 10.111.181.13
还有:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
这似乎按预期工作。但是,当我执行到 pod A 时,我得到:
kubectl exec -it kube-scheduler -n kube-system -- sh
/bin # nslookup kubernetes.default
Server: 8.8.8.8
Address: 8.8.8.8:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
其他调试步骤(在 pod A 内)包括:
/bin # cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 172.30.0.1
并且:
/bin # cat /etc/*-release
3.12.8
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.8
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
也没有来自 coredns pods 的有用日志。
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
从 documentation 看来,Alpine 和 DNS 解析似乎存在一个已知问题(即使我拥有的版本比他们提到的版本更高)。
是否有解决方法可以从 Alpine pod 正确访问服务?
编辑提供 pod A 清单:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --config=/etc/kubernetes/sched-cs.yaml
- --port=0
image: localhost:5000/scheduler-plugins/kube-scheduler:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/sched-cs.yaml
name: sched-cs
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/sched-cs.yaml
type: FileOrCreate
name: sched-cs
status: {}
编辑 2:
手动 添加以下行到 Pod A 的 /etc/resolv.conf
允许我成功执行 curl 请求。
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
难道没有 cleaner/less 手动方法来达到相同的结果吗?
错误curl: (6) Could not resolve host
主要是由于错误的DNS 设置或服务器上的错误设置造成的。你可以找到这个问题的explanation。
如果您想应用自定义 DNS 配置,您可以根据 this documentation:
If a Pod's dnsPolicy
is set to default
, it inherits the name resolution configuration from the node that the Pod runs on. The Pod's DNS resolution should behave the same as the node. But see Known issues.
If you don't want this, or if you want a different DNS config for pods, you can use the kubelet's --resolv-conf
flag. Set this flag to "" to prevent Pods from inheriting DNS. Set it to a valid file path to specify a file other than /etc/resolv.conf
for DNS inheritance.
另一个解决方案是创建您自己的系统映像,您已经在其中放置了您感兴趣的值。
尝试为 Pod A(或任何部署、statefulset 等)设置 DNSPolicy
,将其模板定义为 ClusterFirst
或 ClusterFirstWithHostNet
。
此设置的行为取决于您的集群和 kubelet 的设置方式,但在大多数默认配置中,这将使 pod 内的 kubelet 设置 resolv.conf 以使用您手动设置的 kube-dns 服务在您的编辑 (10.96.0.10) 中设置,这会将集群外部的查找转发到主机的名称服务器。
我有 pod A(它实际上是 kube-scheduler pod)和 pod B(一个具有将由 pod A 调用的 REST API 的 pod)。
为此,我创建了一个 ClusterIP 服务。
现在,当我执行到 pod A 以执行对 pod B 的 API 调用时,我得到:
curl: (6) Could not resolve host: my-svc.default.svc.cluster.local
我尝试按照提到的调试说明进行操作 here:
kubectl exec -i -t dnsutils -- nslookup my-svc.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: my-svc.default.svc.cluster.local
Address: 10.111.181.13
还有:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
这似乎按预期工作。但是,当我执行到 pod A 时,我得到:
kubectl exec -it kube-scheduler -n kube-system -- sh
/bin # nslookup kubernetes.default
Server: 8.8.8.8
Address: 8.8.8.8:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
其他调试步骤(在 pod A 内)包括:
/bin # cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 172.30.0.1
并且:
/bin # cat /etc/*-release
3.12.8
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.8
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
也没有来自 coredns pods 的有用日志。
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
从 documentation 看来,Alpine 和 DNS 解析似乎存在一个已知问题(即使我拥有的版本比他们提到的版本更高)。
是否有解决方法可以从 Alpine pod 正确访问服务?
编辑提供 pod A 清单:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --config=/etc/kubernetes/sched-cs.yaml
- --port=0
image: localhost:5000/scheduler-plugins/kube-scheduler:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/sched-cs.yaml
name: sched-cs
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/sched-cs.yaml
type: FileOrCreate
name: sched-cs
status: {}
编辑 2:
手动 添加以下行到 Pod A 的 /etc/resolv.conf
允许我成功执行 curl 请求。
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
难道没有 cleaner/less 手动方法来达到相同的结果吗?
错误curl: (6) Could not resolve host
主要是由于错误的DNS 设置或服务器上的错误设置造成的。你可以找到这个问题的explanation。
如果您想应用自定义 DNS 配置,您可以根据 this documentation:
If a Pod's
dnsPolicy
is set todefault
, it inherits the name resolution configuration from the node that the Pod runs on. The Pod's DNS resolution should behave the same as the node. But see Known issues.If you don't want this, or if you want a different DNS config for pods, you can use the kubelet's
--resolv-conf
flag. Set this flag to "" to prevent Pods from inheriting DNS. Set it to a valid file path to specify a file other than/etc/resolv.conf
for DNS inheritance.
另一个解决方案是创建您自己的系统映像,您已经在其中放置了您感兴趣的值。
尝试为 Pod A(或任何部署、statefulset 等)设置 DNSPolicy
,将其模板定义为 ClusterFirst
或 ClusterFirstWithHostNet
。
此设置的行为取决于您的集群和 kubelet 的设置方式,但在大多数默认配置中,这将使 pod 内的 kubelet 设置 resolv.conf 以使用您手动设置的 kube-dns 服务在您的编辑 (10.96.0.10) 中设置,这会将集群外部的查找转发到主机的名称服务器。