kube-dns 找不到 api-server

kube-dns cannot find api-server

我正在按照 Kelsey Hightowers 中的描述在 GKE 上设置 kubernetes https://github.com/kelseyhightower/kubernetes-the-hard-way/

除了设置 DNS ClusterAddon 之外一切正常 https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/12-dns-addon.md

当我这样启动 kube-dns 时:

kubectl create -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml

我确实得到了预期的输出:

 serviceaccount "kube-dns" created 
 configmap "kube-dns" created 
 service "kube-dns" 
 created deployment "kube-dns" created

但是检查 pods 的状态和 kube-dns 容器的输出我看到错误:

kubectl get po -n kube-system
NAME                        READY     STATUS             RESTARTS   AGE
kube-dns-6c857864fb-cpvvr   2/3       CrashLoopBackOff   63         2h

并且在容器日志中:

I0115 13:22:35.272492       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0115 13:22:35.772476       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0115 13:22:36.272406       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0115 13:22:36.772356       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0115 13:22:37.272386       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0115 13:22:37.273178       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.32.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
E0115 13:22:37.273340       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.32.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout

URL https://10.32.0.1:443 in the container log seems to be wrong, but I cannot find any place where I can specify a different URL and neither the place where this URL is set in the config file https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml

URL 来自内部 kubernetes 信息(服务帐户令牌),它应该没问题(它应该指向分配给服务网络范围内的第一个 IP,那应该是 kubernetes.default 服务。您需要检查的是您的 pod 到 pod 网络和 kube-proxy(它实现了服务 ClusterIP)是否按预期工作。

如果你这样做 kubectl get svc kubernetes -o yaml 你应该看到一个带有那个 10.32.0.1 IP 的 kubernetes 服务所以也要确认(apiserver 为这个 svc 注册它自己的 IP 所以这样做 ksp get endpoints kubernetes 应该给你APIIP/PORT)

我正在使用 kubespray(版本 v2.5.0)尝试在 Openstack 上设置 kubernetes(版本 1.10.4)集群,但遇到了完全相同的错误消息。 Google 把我带到这里,但没有提供解决这个问题的方法。

我最终的解决方案是将 inventory/mycluster/group_vars/kube-cluster.yml 中的 kube_proxy_mode 选项从默认值 'iptables' 更改为 'ipvs':

# Kube-proxy proxyMode configuration.
# Can be ipvs, iptables
kube_proxy_mode: ipvs

在重新运行 ansible playbook 命令之后,这个问题就消失了,所有 services/pods 都按预期 运行ning。希望对尝试使用相同工具链设置 kubernetes 集群的人有所帮助。