kube-dns getsockopt 没有到主机的路由
kube-dns getsockopt no route to host
我正在努力了解如何在 kubernetes 1.10 上使用 flannel 正确配置 kube-dns,并将 containerd 作为 CRI。
kube-dns 无法 运行,并出现以下错误:
kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container
确实没有到 10.96.0.1 端点的路由:
ip route
default via 10.240.0.254 dev ens160
10.240.0.0/24 dev ens160 proto kernel scope link src 10.240.0.21
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
什么负责配置集群服务地址范围和关联路由?是容器 运行time、覆盖网络(在本例中为 flannel)还是其他?应该在哪里配置?
10-containerd-net.conflist
配置主机和我的 pod 网络之间的桥接。这里也可以配置服务网络吗?
cat /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
编辑:
2016 年刚遇到 this:
As of a few weeks ago (I forget the release but it was a 1.2.x where x
!= 0) (#24429) we fixed the routing such that any traffic that arrives
at a node destined for a service IP will be handled as if it came to a
node port. This means you should be able to set yo static routes for
your service cluster IP range to one or more nodes and the nodes will
act as bridges. This is the same trick most people do with flannel to
bridge the overlay.
It's imperfect but it works. In the future will will need to get more
precise with the routing if you want optimal behavior (i.e. not losing
the client IP), or we will see more non-kube-proxy implementations of
services.
这还有意义吗?我需要为服务 CIDR 设置静态路由吗?或者问题实际上是 kube-proxy
而不是 flannel 或 containerd?
我的绒布配置:
cat /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
和 kube-proxy:
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr=10.244.0.0/16 \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master=https://192.168.160.1:6443 \
--proxy-mode=ipvs \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
编辑:
看了kube-proxy debugging steps,好像kube-proxy
联系不到大师。我怀疑这是问题的很大一部分。我在 HAProxy 负载均衡器后面有 3 个 controller/master 节点,它绑定到 192.168.160.1:6443
并将循环转发给 10.240.0.1[1|2|3]:6443
上的每个主节点。这个可以看上面的output/configs
在kube-proxy.service
中,我指定了--master=192.168.160.1:6443
。为什么尝试连接到端口 443?我可以更改它吗 - 似乎没有端口标志?出于某种原因需要端口 443 吗?
这个答案有两个组成部分,一个关于 运行ning kube-proxy
,另一个关于 :443 URL 的来源。
首先,关于kube-proxy
:请不要运行 kube-proxy
那样作为系统服务。它旨在由 kubelet
在集群 中启动,以便 SDN 地址行为合理,因为它们实际上是 "fake" 地址。通过运行宁kube-proxy
在kubelet
的控制之外,各种奇怪的事情都会发生,除非你花费大量的精力来复制kubelet
配置的方式它的下属 docker 个容器。
现在,关于那个 :443 URL:
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
...
Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
10.96.0.1 来自集群的服务 CIDR,它(并且应该)与 Pod CIDR 分开,Pod CIDR 应该与节点的子网等分开。集群的 .1
服务 CIDR 保留(或 传统上 分配)到 kubernetes.default.svc.cluster.local
Service
,其中一个 Service.port
作为 443
。
我不太确定为什么 --master
标志没有取代 /etc/kubernetes/kube-proxy.conf
中的值,但由于该文件很明显只能由 kube-proxy
使用,为什么不更新文件中的值以消除所有疑问?
我正在努力了解如何在 kubernetes 1.10 上使用 flannel 正确配置 kube-dns,并将 containerd 作为 CRI。
kube-dns 无法 运行,并出现以下错误:
kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container
确实没有到 10.96.0.1 端点的路由:
ip route
default via 10.240.0.254 dev ens160
10.240.0.0/24 dev ens160 proto kernel scope link src 10.240.0.21
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
什么负责配置集群服务地址范围和关联路由?是容器 运行time、覆盖网络(在本例中为 flannel)还是其他?应该在哪里配置?
10-containerd-net.conflist
配置主机和我的 pod 网络之间的桥接。这里也可以配置服务网络吗?
cat /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
编辑:
2016 年刚遇到 this:
As of a few weeks ago (I forget the release but it was a 1.2.x where x != 0) (#24429) we fixed the routing such that any traffic that arrives at a node destined for a service IP will be handled as if it came to a node port. This means you should be able to set yo static routes for your service cluster IP range to one or more nodes and the nodes will act as bridges. This is the same trick most people do with flannel to bridge the overlay.
It's imperfect but it works. In the future will will need to get more precise with the routing if you want optimal behavior (i.e. not losing the client IP), or we will see more non-kube-proxy implementations of services.
这还有意义吗?我需要为服务 CIDR 设置静态路由吗?或者问题实际上是 kube-proxy
而不是 flannel 或 containerd?
我的绒布配置:
cat /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
和 kube-proxy:
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr=10.244.0.0/16 \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master=https://192.168.160.1:6443 \
--proxy-mode=ipvs \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
编辑:
看了kube-proxy debugging steps,好像kube-proxy
联系不到大师。我怀疑这是问题的很大一部分。我在 HAProxy 负载均衡器后面有 3 个 controller/master 节点,它绑定到 192.168.160.1:6443
并将循环转发给 10.240.0.1[1|2|3]:6443
上的每个主节点。这个可以看上面的output/configs
在kube-proxy.service
中,我指定了--master=192.168.160.1:6443
。为什么尝试连接到端口 443?我可以更改它吗 - 似乎没有端口标志?出于某种原因需要端口 443 吗?
这个答案有两个组成部分,一个关于 运行ning kube-proxy
,另一个关于 :443 URL 的来源。
首先,关于kube-proxy
:请不要运行 kube-proxy
那样作为系统服务。它旨在由 kubelet
在集群 中启动,以便 SDN 地址行为合理,因为它们实际上是 "fake" 地址。通过运行宁kube-proxy
在kubelet
的控制之外,各种奇怪的事情都会发生,除非你花费大量的精力来复制kubelet
配置的方式它的下属 docker 个容器。
现在,关于那个 :443 URL:
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
...
Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
10.96.0.1 来自集群的服务 CIDR,它(并且应该)与 Pod CIDR 分开,Pod CIDR 应该与节点的子网等分开。集群的 .1
服务 CIDR 保留(或 传统上 分配)到 kubernetes.default.svc.cluster.local
Service
,其中一个 Service.port
作为 443
。
我不太确定为什么 --master
标志没有取代 /etc/kubernetes/kube-proxy.conf
中的值,但由于该文件很明显只能由 kube-proxy
使用,为什么不更新文件中的值以消除所有疑问?