来自 kubernetes api 服务的不一致响应,有时没有到主机错误的路由

Inconsistent response from kubernetes api service and getting no route to host error sometimes

我已经使用 kelsey tower 的 kubernetes 困难方法配置了 kubernetes 集群

不幸的是,当我点击 kubernetes 服务 ip 以检查来自工作节点的版本时,我看到不一致的响应

这是我的集群详细信息

root@kubem1:~# kubectl get no
NAME     STATUS   ROLES    AGE   VERSION
kubew1   Ready    <none>   14h   v1.18.3
kubew2   Ready    <none>   14h   v1.18.3
root@kubem1:~# kubectl get no -o wide
NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kubew1   Ready    <none>   14h   v1.18.3   192.168.56.103   <none>        Ubuntu 18.04.4 LTS   4.15.0-76-generic   containerd://1.2.9
kubew2   Ready    <none>   14h   v1.18.3   192.168.56.104   <none>        Ubuntu 18.04.4 LTS   4.15.0-76-generic   containerd://1.2.9
root@kubem1:~# kubectl get svc -o wide
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE   SELECTOR
kubernetes   ClusterIP   10.32.0.1    <none>        443/TCP   21h   <none>
root@kubem1:~# kubectl get po -n kube-system -o wide
NAME                       READY   STATUS    RESTARTS   AGE    IP           NODE     NOMINATED NODE   READINESS GATES
coredns-589fff4ffc-mwrpk   1/1     Running   0          163m   10.200.1.5   kubew1   <none>           <none>
coredns-589fff4ffc-qps68   1/1     Running   0          163m   10.200.2.3   kubew2   <none>           <none>
root@kubem1:~#

来自工作节点,

Kube-proxy 系统配置

cat /etc/systemd/system/kube-proxy.service 
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-proxy \
  --config=/var/lib/kube-proxy/kube-proxy-config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

kube-proxy 配置 yaml 文件

cat /var/lib/kube-proxy/kube-proxy-config.yaml
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
  kubeconfig: "/var/lib/kube-proxy/kubeconfig"
mode: "iptables"
clusterCIDR: "10.200.0.0/16"

kube-proxy 服务状态

    root@kubew2:~# service kube-proxy status
● kube-proxy.service - Kubernetes Kube Proxy
   Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-05-26 07:47:22 UTC; 9min ago
     Docs: https://github.com/kubernetes/kubernetes
 Main PID: 11502 (kube-proxy)
    Tasks: 6 (limit: 1111)
   CGroup: /system.slice/kube-proxy.service
           └─11502 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/kube-proxy-config.yaml

May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.697056   11502 node.go:136] Successfully retrieved node IP: 192.168.56.104
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.699467   11502 server_others.go:186] Using iptables Proxier.
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.699748   11502 server.go:583] Version: v1.18.3
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.700110   11502 conntrack.go:52] Setting nf_conntrack_max to 131072
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.702569   11502 config.go:315] Starting service config controller
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.702786   11502 shared_informer.go:223] Waiting for caches to sync for service config
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.702922   11502 config.go:133] Starting endpoints config controller
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.703039   11502 shared_informer.go:223] Waiting for caches to sync for endpoints config
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.803627   11502 shared_informer.go:230] Caches are synced for endpoints config
May 26 07:47:22 kubew2 kube-proxy[11502]: I0526 07:47:22.804515   11502 shared_informer.go:230] Caches are synced for service config
root@kubew2:~#

这是有问题的输出。2,3 次它给出正确的输出,之后它抛出错误,因为没有路由主机,它再次工作

root@kubew2:~# curl -k https://10.32.0.1:443/version
{
  "major": "1",
  "minor": "18",
  "gitVersion": "v1.18.3",
  "gitCommit": "2e7996e3e2712684bc73f0dec0200d64eec7fe40",
  "gitTreeState": "clean",
  "buildDate": "2020-05-20T12:43:34Z",
  "goVersion": "go1.13.9",
  "compiler": "gc",
  "platform": "linux/amd64"
}root@kubew2:~# curl -k https://10.32.0.1:443/version
{
  "major": "1",
  "minor": "18",
  "gitVersion": "v1.18.3",
  "gitCommit": "2e7996e3e2712684bc73f0dec0200d64eec7fe40",
  "gitTreeState": "clean",
  "buildDate": "2020-05-20T12:43:34Z",
  "goVersion": "go1.13.9",
  "compiler": "gc",
  "platform": "linux/amd64"
}

root@kubew2:~# curl -k https://10.32.0.1:443/version

curl: (7) Failed to connect to 10.32.0.1 port 443: No route to host

root@kubew2:~# curl -k https://10.32.0.1:443/version
{
  "major": "1",
  "minor": "18",
  "gitVersion": "v1.18.3",
  "gitCommit": "2e7996e3e2712684bc73f0dec0200d64eec7fe40",
  "gitTreeState": "clean",
  "buildDate": "2020-05-20T12:43:34Z",
  "goVersion": "go1.13.9",
  "compiler": "gc",
  "platform": "linux/amd64"

我发现 issue.Since 它是高可用性设置,有 2 个节点(端点)api 服务,不幸的是另一个节点 192.168.56.102 - kube-apiserver无法连接该节点上 运行 的 etcd,每当 curl 命令命中解析为 192.168.56.102 的 kubernetes 服务 ip 时,我无法获得到主机的路由,因为它无法从节点 2 获取日期etcd 数据库

我已经从 kube-apiserver 命令行 arqs -

中删除了 etcd 第二个节点 etcd memeber(192.168.56.102:2380)
--etcd-servers=http://192.168.56.101:2379,http://192.168.56.102:2380

从 kubernetes 服务的端点中删除了第二个节点

root@kubem1:~# kubectl get ep
NAME         ENDPOINTS                                 AGE
kubernetes   192.168.56.101:6443,192.168.56.102:6443   22h

root@kubem1:~# kubectl edit ep kubernetes
endpoints/kubernetes edited

root@kubem1:~# kubectl get ep kubernetes
NAME         ENDPOINTS             AGE
kubernetes   192.168.56.101:6443   22h

现在我可以在没有路由到主机的情况下正确地获得 curl 输出