从基于 Alpine 的 k8s pod 访问服务抛出 DNS 解析错误

Question

我有 pod A（它实际上是 kube-scheduler pod）和 pod B（一个具有将由 pod A 调用的 REST API 的 pod）。

为此，我创建了一个 ClusterIP 服务。

现在，当我执行到 pod A 以执行对 pod B 的 API 调用时，我得到： curl: (6) Could not resolve host: my-svc.default.svc.cluster.local

我尝试按照提到的调试说明进行操作 here:

kubectl exec -i -t dnsutils -- nslookup my-svc.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   my-svc.default.svc.cluster.local
Address: 10.111.181.13

还有：

kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

这似乎按预期工作。但是，当我执行到 pod A 时，我得到：

kubectl exec -it kube-scheduler -n kube-system -- sh
/bin # nslookup kubernetes.default
Server:         8.8.8.8
Address:        8.8.8.8:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

其他调试步骤（在 pod A 内）包括：

/bin # cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 172.30.0.1

并且：

/bin # cat /etc/*-release
3.12.8
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.8
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

也没有来自 coredns pods 的有用日志。

kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

从 documentation 看来，Alpine 和 DNS 解析似乎存在一个已知问题（即使我拥有的版本比他们提到的版本更高）。

是否有解决方法可以从 Alpine pod 正确访问服务？

编辑提供 pod A 清单：

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --config=/etc/kubernetes/sched-cs.yaml
    - --port=0
    image: localhost:5000/scheduler-plugins/kube-scheduler:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /etc/kubernetes/sched-cs.yaml
      name: sched-cs
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /etc/kubernetes/sched-cs.yaml
      type: FileOrCreate
    name: sched-cs
status: {}

编辑 2：手动添加以下行到 Pod A 的 /etc/resolv.conf 允许我成功执行 curl 请求。

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

难道没有 cleaner/less 手动方法来达到相同的结果吗？

Answer 1

错误curl: (6) Could not resolve host 主要是由于错误的DNS 设置或服务器上的错误设置造成的。你可以找到这个问题的explanation。

如果您想应用自定义 DNS 配置，您可以根据 this documentation:

If a Pod's dnsPolicy is set to default, it inherits the name resolution configuration from the node that the Pod runs on. The Pod's DNS resolution should behave the same as the node. But see Known issues.

If you don't want this, or if you want a different DNS config for pods, you can use the kubelet's --resolv-conf flag. Set this flag to "" to prevent Pods from inheriting DNS. Set it to a valid file path to specify a file other than /etc/resolv.conf for DNS inheritance.

另一个解决方案是创建您自己的系统映像，您已经在其中放置了您感兴趣的值。

Answer 2

尝试为 Pod A（或任何部署、statefulset 等）设置 DNSPolicy，将其模板定义为 ClusterFirst 或 ClusterFirstWithHostNet。

此设置的行为取决于您的集群和 kubelet 的设置方式，但在大多数默认配置中，这将使 pod 内的 kubelet 设置 resolv.conf 以使用您手动设置的 kube-dns 服务在您的编辑 (10.96.0.10) 中设置，这会将集群外部的查找转发到主机的名称服务器。

K8s docs

从基于 Alpine 的 k8s pod 访问服务抛出 DNS 解析错误

Accessing service from an Alpine-based k8s pod is throwing a DNS Resolution error

docker

kubernetes

alpine

coredns