无法使用 heapster 和 kube-dns 在 Kubernetes 上解析 monitoring-influxdb

Can't resolve monitoring-influxdb on Kubernetes with heapster and kube-dns

我正在尝试让 Heapster 在我的 Kubernetes 集群上运行。我正在使用 Kube-DNS 进行 DNS 解析。

我的 Kube-DNS 似乎设置正确:

kubectl describe pod kube-dns-v20-z2dd2 -n kube-system

Name:           kube-dns-v20-z2dd2
Namespace:      kube-system
Node:           172.31.48.201/172.31.48.201
Start Time:     Mon, 22 Jan 2018 09:21:49 +0000
Labels:         k8s-app=kube-dns
                version=v20
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
                scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status:         Running
IP:             172.17.29.4
Controlled By:  ReplicationController/kube-dns-v20
Containers:
  kubedns:
    Container ID:  docker://13f95bdf8dee273ca18a2eee1b99fe00e5fff41279776cdef5d7e567472a39dc
    Image:         gcr.io/google_containers/kubedns-amd64:1.8
    Image ID:      docker-pullable://gcr.io/google_containers/kubedns-amd64@sha256:39264fd3c998798acdf4fe91c556a6b44f281b6c5797f464f92c3b561c8c808c
    Ports:         10053/UDP, 10053/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
    State:          Running
      Started:      Mon, 22 Jan 2018 09:22:05 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
  dnsmasq:
    Container ID:  docker://576ebc30e8f7aae13000a2d06541c165a3302376ad04c604b12803463380d9b5
    Image:         gcr.io/google_containers/kube-dnsmasq-amd64:1.4
    Image ID:      docker-pullable://gcr.io/google_containers/kube-dnsmasq-amd64@sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
    Ports:         53/UDP, 53/TCP
    Args:
      --cache-size=1000
      --no-resolv
      --server=127.0.0.1#10053
      --log-facility=-
    State:          Running
      Started:      Mon, 22 Jan 2018 09:22:20 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
  healthz:
    Container ID:  docker://3367d05fb0e13c892243a4c86c74a170b0a9a2042387a70f6690ed946afda4d2
    Image:         gcr.io/google_containers/exechealthz-amd64:1.2
    Image ID:      docker-pullable://gcr.io/google_containers/exechealthz-amd64@sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
    Port:          8080/TCP
    Args:
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
      --url=/healthz-dnsmasq
      --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
      --url=/healthz-kubedns
      --port=8080
      --quiet
    State:          Running
      Started:      Mon, 22 Jan 2018 09:22:32 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  default-token-9zxzd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9zxzd
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type    Reason                 Age   From                    Message
  ----    ------                 ----  ----                    -------
  Normal  Scheduled              43m   default-scheduler       Successfully assigned kube-dns-v20-z2dd2 to 172.31.48.201
  Normal  SuccessfulMountVolume  43m   kubelet, 172.31.48.201  MountVolume.SetUp succeeded for volume "default-token-9zxzd"
  Normal  Pulling                43m   kubelet, 172.31.48.201  pulling image "gcr.io/google_containers/kubedns-amd64:1.8"
  Normal  Pulled                 43m   kubelet, 172.31.48.201  Successfully pulled image "gcr.io/google_containers/kubedns-amd64:1.8"
  Normal  Created                43m   kubelet, 172.31.48.201  Created container
  Normal  Started                43m   kubelet, 172.31.48.201  Started container
  Normal  Pulling                43m   kubelet, 172.31.48.201  pulling image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
  Normal  Pulled                 42m   kubelet, 172.31.48.201  Successfully pulled image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
  Normal  Created                42m   kubelet, 172.31.48.201  Created container
  Normal  Started                42m   kubelet, 172.31.48.201  Started container
  Normal  Pulling                42m   kubelet, 172.31.48.201  pulling image "gcr.io/google_containers/exechealthz-amd64:1.2"
  Normal  Pulled                 42m   kubelet, 172.31.48.201  Successfully pulled image "gcr.io/google_containers/exechealthz-amd64:1.2"
  Normal  Created                42m   kubelet, 172.31.48.201  Created container
  Normal  Started                42m   kubelet, 172.31.48.201  Started container

kubectl describe svc kube-dns -n kube-system

Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       <none>
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.254.0.2
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         172.17.29.4:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         172.17.29.4:53
Session Affinity:  None
Events:            <none>

kubectl describe ep kube-dns -n kube-system

Name:         kube-dns
Namespace:    kube-system
Labels:       k8s-app=kube-dns
              kubernetes.io/cluster-service=true
              kubernetes.io/name=KubeDNS
Annotations:  <none>
Subsets:
  Addresses:          172.17.29.4
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    dns      53    UDP
    dns-tcp  53    TCP

Events:  <none>

kubectl exec -it busybox1 -- nslookup kubernetes.default

Server:    10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local

但是,如果我试图在任一 busybox 容器(kube-system 命名空间之外)上解析 http://monitoring-influxdb,它就无法解析:

kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup http://monitoring-influxdb

Server:    (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost

nslookup: can't resolve 'http://monitoring-influxdb': Try again
command terminated with exit code 1

kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- cat /etc/resolv.conf

nameserver 10.254.0.2
search kube-system.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5

kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb

Server:    10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'http://monitoring-influxdb'
command terminated with exit code 1

kubectl exec -it busybox1 -- cat /etc/resolv.conf

nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5

最后是来自 heapster pod 的日志。我在 dns pod 日志中找不到任何错误:

kubectl 记录 heapster-v1.2.0-7657f45c77-65w7w heapster -n kube-system

E0122 09:22:46.966896       1 influxdb.go:217] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: server misbehaving, will retry on use

非常感谢任何指点。

编辑:

monitoring-influxdb 位于与 heapster (kube-system) 相同的命名空间中。

kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system --nslookup monitoring-influxdb.kube-system

Server:    (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost

nslookup: can't resolve 'monitoring-influxdb.kube-system': Name does not resolve
command terminated with exit code 1

但无论出于何种原因,busybox 都能够解析服务器。

kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb.kube-system

Server:    10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local

Name:      monitoring-influxdb.kube-system
Address 1: 10.254.48.109 monitoring-influxdb.kube-system.svc.cluster.local

kubectl -n kube-system 获取 svc

NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
heapster               ClusterIP   10.254.193.208   <none>        80/TCP              1h
kube-dns               ClusterIP   10.254.0.2       <none>        53/UDP,53/TCP       1h
kubernetes-dashboard   NodePort    10.254.89.241    <none>        80:32431/TCP        1h
monitoring-grafana     ClusterIP   10.254.176.96    <none>        80/TCP              1h
monitoring-influxdb    ClusterIP   10.254.48.109    <none>        8083/TCP,8086/TCP   1h

kubectl -n kube-system 获取 ep

NAME                      ENDPOINTS                           AGE
heapster                  172.17.29.7:8082                    1h
kube-controller-manager   <none>                              1h
kube-dns                  172.17.29.6:53,172.17.29.6:53       1h
kubernetes-dashboard      172.17.29.5:9090                    1h
monitoring-grafana        172.17.29.3:3000                    1h
monitoring-influxdb       172.17.29.3:8086,172.17.29.3:8083   1h

在 kubernetes 中,您可以单独通过名称解析服务,但前提是您在同一个命名空间内。

也可以通过以下形式的 DNS 名称访问服务:

<service name>.<namespace>

从你的问题看不清楚你在哪个命名空间部署了influxdb,但请试试上面的建议。