无法从 pod 内部 ping ClusterIP,并且 DNS 不适用于 google.com 等外部域

Cannot ping ClusterIP from inside the pod and DNS is not working for external domains like google.com

我已经在 Bare-metal/Ubuntu 上安装了 Kubernetes。我在 6b649d7f9f2b09ca8b0dd8c0d3e14dcb255432d1 提交 git。我使用 cd kubernetes/cluster; KUBERNETES_PROVIDER=ubuntu ./kube-up.sh 后跟 cd kubernetes/cluster/ubuntu; ./deployAddons.sh 来启动集群。一切顺利,集群启动。

我的/ubuntu/config-default.sh如下:

# Define all your cluster nodes, MASTER node comes first"
# And separated with blank space like <user_1@ip_1> <user_2@ip_2> <user_3@ip_3> 
export nodes=${nodes:-"root@192.168.48.170 root@192.168.48.180"}

# Define all your nodes role: a(master) or i(minion) or ai(both master and minion), must be the order same 
role=${role:-"ai i"}
# If it practically impossible to set an array as an environment variable
# from a script, so assume variable is a string then convert it to an array
export roles=($role)

# Define minion numbers
export NUM_NODES=${NUM_NODES:-2}
# define the IP range used for service cluster IPs.
# according to rfc 1918 ref: https://tools.ietf.org/html/rfc1918 choose a private ip range here.
export SERVICE_CLUSTER_IP_RANGE=${SERVICE_CLUSTER_IP_RANGE:-192.168.3.0/24}  # formerly PORTAL_NET
# define the IP range used for flannel overlay network, should not conflict with above SERVICE_CLUSTER_IP_RANGE
export FLANNEL_NET=${FLANNEL_NET:-172.16.0.0/16}

# Optionally add other contents to the Flannel configuration JSON
# object normally stored in etcd as /coreos.com/network/config.  Use
# JSON syntax suitable for insertion into a JSON object constructor
# after other field name:value pairs.  For example:
# FLANNEL_OTHER_NET_CONFIG=', "SubnetMin": "172.16.10.0", "SubnetMax": "172.16.90.0"'

export FLANNEL_OTHER_NET_CONFIG
FLANNEL_OTHER_NET_CONFIG=''

# Admission Controllers to invoke prior to persisting objects in cluster
export ADMISSION_CONTROL=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota,SecurityContextDeny

# Path to the config file or directory of files of kubelet
export KUBELET_CONFIG=${KUBELET_CONFIG:-""}

# A port range to reserve for services with NodePort visibility
SERVICE_NODE_PORT_RANGE=${SERVICE_NODE_PORT_RANGE:-"30000-32767"}

# Optional: Enable node logging.
ENABLE_NODE_LOGGING=false
LOGGING_DESTINATION=${LOGGING_DESTINATION:-elasticsearch}

# Optional: When set to true, Elasticsearch and Kibana will be setup as part of the cluster bring up.
ENABLE_CLUSTER_LOGGING=false
ELASTICSEARCH_LOGGING_REPLICAS=${ELASTICSEARCH_LOGGING_REPLICAS:-1}

# Optional: When set to true, heapster, Influxdb and Grafana will be setup as part of the cluster bring up.
ENABLE_CLUSTER_MONITORING="${KUBE_ENABLE_CLUSTER_MONITORING:-true}"

# Extra options to set on the Docker command line.  This is useful for setting
# --insecure-registry for local registries.
DOCKER_OPTS=${DOCKER_OPTS:-""}

# Extra options to set on the kube-proxy command line.  This is useful
# for selecting the iptables proxy-mode, for example.
KUBE_PROXY_EXTRA_OPTS=${KUBE_PROXY_EXTRA_OPTS:-""}

# Optional: Install cluster DNS.
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
# DNS_SERVER_IP must be a IP in SERVICE_CLUSTER_IP_RANGE
DNS_SERVER_IP=${DNS_SERVER_IP:-"192.168.3.10"}
DNS_DOMAIN=${DNS_DOMAIN:-"cluster.local"}
DNS_REPLICAS=${DNS_REPLICAS:-1}

# Optional: Install Kubernetes UI
ENABLE_CLUSTER_UI="${KUBE_ENABLE_CLUSTER_UI:-true}"

# Optional: Enable setting flags for kube-apiserver to turn on behavior in active-dev
RUNTIME_CONFIG="--basic-auth-file=password.csv"

# Optional: Add http or https proxy when download easy-rsa.
# Add envitonment variable separated with blank space like "http_proxy=http://10.x.x.x:8080 https_proxy=https://10.x.x.x:8443"
PROXY_SETTING=${PROXY_SETTING:-""}

DEBUG=${DEBUG:-"false"}

然后,我使用以下 yml 文件创建了一个 pod:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80

以及使用以下 yml 的服务:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  ports:
  - port: 8000
    targetPort: 80
    protocol: TCP
  selector:
    app: nginx
  type: NodePort

然后,我使用docker exec -it [CONTAINER_ID] bash进入启动的集装箱码头。主要有两个问题:

  1. 我无法ping通google.com这样的外部域,但是我可以ping通8.8.8.8这样的外部IP。所以容器可以访问互联网。
  2. 内部服务解析为更正内部 ClusterIP,但我无法从容器内部 ping 该 IP。

宿主的/etc/resolve.conf文件如下:

nameserver 8.8.8.8
nameserver 127.0.1.1

容器的/etc/resolve.conf文件如下:

search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 192.168.3.10
nameserver 8.8.8.8
nameserver 127.0.1.1
options ndots:5

关于第一个问题,我认为它可能与 SkyDNS 名称服务器配置错误或我必须执行但我不知道的自定义配置有关。

但是,我不知道为什么容器无法 ping ClusterIP。

有什么解决方法吗?

我找到了解决方法。命令行参数部分中的 SkyDNS 文档,特别是 "nameservers" 参数暗示:

nameservers: forward DNS requests to these (recursive) nameservers (array of IP:port combination), when not authoritative for a domain. This defaults to the servers listed in /etc/resolv.conf

但事实并非如此!要解决这个问题,dns 插件复制控制器配置文件 (cluster/addons/dns/skydns-rc.yaml.in) 应该更改为包含名称服务器配置。我按如下方式更改了 skydns 容器部分,效果非常好。

  - name: skydns
    image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
    resources:
      # keep request = limit to keep this container in guaranteed class
      limits:
        cpu: 100m
        memory: 50Mi
      requests:
        cpu: 100m
        memory: 50Mi
    args:
    # command = "/skydns"
    - -machines=http://127.0.0.1:4001
    - -addr=0.0.0.0:53
    - -nameservers=8.8.8.8:53
    - -ns-rotate=false
    - -domain={{ pillar['dns_domain'] }}.
    ports:
    - containerPort: 53
      name: dns
      protocol: UDP
    - containerPort: 53
      name: dns-tcp
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 30
      timeoutSeconds: 5
    readinessProbe:
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 1
      timeoutSeconds: 5

我可以回答您的 ping clusterIP 问题。 我遇到了同样的问题,想从Pod ping服务的集群IP。

解析好像无法ping通集群IP,但是端点可以通过curl with port访问。

我只是四处寻找有关 ping 虚拟 IP 的详细信息。

另一种处理与 DNS 相同问题的方法是在 configMap 中设置上游服务器:

apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kube-dns
      namespace: kube-system
    data:
        upstreamNameservers: |
        ["8.8.8.8", "8.8.4.4"]

如果服务使用iptables实现,那么clusterIp是无法ping通的,因为iptables只是只允许tcp包。但是当你 curl clusterIP+port 时,iptables 规则将这个 tcp 数据包 dnat 到 pod。

#ping 10.96.229.40
PING 10.96.229.40 (10.96.229.40) 56(84) bytes of data.
^C
--- 10.96.229.40 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms


#iptables-save |grep 10.96.229.40
-A KUBE-SERVICES -d 10.96.229.40/32 -p tcp -m comment --comment "***-service:https has no endpoints" -m tcp --dport 8443 -j REJECT --reject-with icmp-port-unreachable

如果服务使用了ipvs那么你可以ping clusterIP。但是本地环回设备发送的响应,因为kube-proxy添加路由规则到lo

# ip route get 10.68.155.139
local 10.68.155.139 dev lo src 10.68.155.139 
    cache <local> 
# ping -c 1 10.68.155.139
PING 10.68.155.139 (10.68.155.139) 56(84) bytes of data.
64 bytes from 10.68.155.139: icmp_seq=1 ttl=64 time=0.045 ms

--- 10.68.155.139 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.045/0.045/0.045/0.000 ms

我正在使用 Azure AKS 服务,我遇到了类似的问题。我能够 ping 通 Pod IP,但无法 ping 通服务端点。作为解决方法,我使用了端点可 ping 通的无头服务。

[ root@curl:/ ]$ ping  sfs-statefulset-0.headless-svc.myns.svc.cluster.local
PING sfs-statefulset-0.headless-svc.myns.svc.cluster.local (10.244.0.37): 56 data bytes
64 bytes from 10.244.0.37: seq=0 ttl=64 time=0.059 ms
64 bytes from 10.244.0.37: seq=1 ttl=64 time=0.090 ms
64 bytes from 10.244.0.37: seq=2 ttl=64 time=0.087 ms
^C
--- sfs-statefulset-0.headless-svc.myns.svc.cluster.local ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.059/0.078/0.090 ms
[ root@curl:/ ]$ ping headless-svc.myns.svc.cluster.local
PING headless-svc.myns.svc.cluster.local (10.244.0.36): 56 data bytes
64 bytes from 10.244.0.36: seq=0 ttl=64 time=0.051 ms
64 bytes from 10.244.0.36: seq=1 ttl=64 time=0.092 ms
64 bytes from 10.244.0.36: seq=2 ttl=64 time=0.098 ms
64 bytes from 10.244.0.36: seq=3 ttl=64 time=0.083 ms