master 无法连接到集群
Master can't connect to cluster
集群升级后,三个主节点之一无法连接回集群。我在 us-east-1a、us-east-1b 和 us-east-1c 有一个 HA 集群 运行ning,我的 master 在 us-east-1a 是 运行ning 无法加入回到集群。
我试图将 master-us-east-1a 实例组缩减到零个节点,然后将其恢复到一个节点,但是 EC2 机器开始时遇到同样的问题,似乎无法再次加入集群从备份或其他东西开始。
我试过连接master重启服务,可能是protukube或者docker,但是我也无法解决问题
在主机中通过 ssh 连接我注意到 flannel 服务在这台机器上没有 运行ning。我尝试通过 docker 手动 运行 但没有成功。似乎 flannel 是应该 运行ning 而不是的网络服务。
- 我可以重置 us-east-1a 的主人并从零开始创建它吗?
- 关于在这位大师中获得法兰绒服务 运行 有什么想法吗?
提前致谢。
附件
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
-
> sudo systemctl status kubelet
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.026553 2502 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.027005 2502 kubelet_node_status.go:79] Attempting to register node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: E0110 21:00:55.027764 2502 kubelet_node_status.go:103] Unable to register node "ip-xxx-xxx-xxx-xxx.ec2.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: connect: connection refused
-
> sudo docker logs k8s_kube-apiserver_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_16
F0110 20:59:35.581865 1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [http://127.0.0.1:4001] true false 1000 0xc42013c480 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)
-
> sudo docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Experimental: false
-
> kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:40:24Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 127.0.0.1 was refused - did you specify the right host or port?
-
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
protokube 1.15.0 6b00e7216827 7 weeks ago 288 MB
k8s.gcr.io/kube-proxy v1.11.9 e18fcce798b8 9 months ago 98.1 MB
k8s.gcr.io/kube-controller-manager v1.11.9 634ccbd18a0f 9 months ago 155 MB
k8s.gcr.io/kube-apiserver v1.11.9 ef9a84756d40 9 months ago 187 MB
k8s.gcr.io/kube-scheduler v1.11.9 e00d30bd3a71 9 months ago 56.9 MB
k8s.gcr.io/pause-amd64 3.0 99e59f495ffa 3 years ago 747 kB
kopeio/etcd-manager 3.0.20190930 7937b67f722f 50 years ago 656 MB
-
> sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b4eb0ec9e6a2 k8s.gcr.io/kube-scheduler@sha256:372ab1014701f60b67a65d94f94d30d19335294d98746edcdfcb8808ed5aee3c "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-scheduler_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
8f827dc0eade kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
5bebb169b8b3 k8s.gcr.io/kube-controller-manager@sha256:aa9b9dac085a65c47746fa8739cf70e9d7e9a356a836ad2ef073da0d7b136db2 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-controller-manager_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
4467d550824e k8s.gcr.io/kube-proxy@sha256:a63c81fe4d3e9575cc0a29c4866a2975b01a07c0f473ab2cf1e88ebf78739f80 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-proxy_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
0a5c23006e18 kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
3efa9ae55618 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
4e451bc007ac k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
7c5c301e034a k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_0
d88f075fa61f k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
69e8844e9c14 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
05e67c2e8f98 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
eee0a4d563c0 protokube:1.15.0 "/usr/bin/protokub..." 15 hours ago Up 15 hours hungry_shirley
Kubelet 正在尝试使用 API 服务器端点 https://127.0.0.1:443 注册主节点 us-east-1a。我相信这应该是其他两个主服务器中任何一个的 API 服务器端点。 Kubelet 使用 kubelet.conf 文件与 API 服务器通信以注册 node.Change 位于 /etc/kubernetes
的 kubelet.conf 文件中的 server
指向其中之一以下:
- us-east-1b 或 us-east-1c ex https://xx.xx.xx.xx:6443
主节点的 - 弹性 IP 或 public IP
- 当前主节点us-east-1b或us-east-1c的私有IP ex https://xx.xx.xx.xx:6443
- 当前主节点的 FQDN,如果你的主节点前面有一个负载平衡器 运行 kubernetes API 服务器。
更改后kubelet.conf重启kubelet。
编辑:由于您使用的是 etcd 管理器,您可以尝试描述的 Kubernetes 服务不可用/flannel 问题故障排除步骤 here
你能验证 etcd 服务是否 运行 并且在 us-east-1a 上在线吗?
集群升级后,三个主节点之一无法连接回集群。我在 us-east-1a、us-east-1b 和 us-east-1c 有一个 HA 集群 运行ning,我的 master 在 us-east-1a 是 运行ning 无法加入回到集群。
我试图将 master-us-east-1a 实例组缩减到零个节点,然后将其恢复到一个节点,但是 EC2 机器开始时遇到同样的问题,似乎无法再次加入集群从备份或其他东西开始。
我试过连接master重启服务,可能是protukube或者docker,但是我也无法解决问题
在主机中通过 ssh 连接我注意到 flannel 服务在这台机器上没有 运行ning。我尝试通过 docker 手动 运行 但没有成功。似乎 flannel 是应该 运行ning 而不是的网络服务。
- 我可以重置 us-east-1a 的主人并从零开始创建它吗?
- 关于在这位大师中获得法兰绒服务 运行 有什么想法吗?
提前致谢。
附件
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
-
> sudo systemctl status kubelet
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.026553 2502 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.027005 2502 kubelet_node_status.go:79] Attempting to register node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: E0110 21:00:55.027764 2502 kubelet_node_status.go:103] Unable to register node "ip-xxx-xxx-xxx-xxx.ec2.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: connect: connection refused
-
> sudo docker logs k8s_kube-apiserver_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_16
F0110 20:59:35.581865 1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [http://127.0.0.1:4001] true false 1000 0xc42013c480 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)
-
> sudo docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Experimental: false
-
> kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:40:24Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 127.0.0.1 was refused - did you specify the right host or port?
-
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
protokube 1.15.0 6b00e7216827 7 weeks ago 288 MB
k8s.gcr.io/kube-proxy v1.11.9 e18fcce798b8 9 months ago 98.1 MB
k8s.gcr.io/kube-controller-manager v1.11.9 634ccbd18a0f 9 months ago 155 MB
k8s.gcr.io/kube-apiserver v1.11.9 ef9a84756d40 9 months ago 187 MB
k8s.gcr.io/kube-scheduler v1.11.9 e00d30bd3a71 9 months ago 56.9 MB
k8s.gcr.io/pause-amd64 3.0 99e59f495ffa 3 years ago 747 kB
kopeio/etcd-manager 3.0.20190930 7937b67f722f 50 years ago 656 MB
-
> sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b4eb0ec9e6a2 k8s.gcr.io/kube-scheduler@sha256:372ab1014701f60b67a65d94f94d30d19335294d98746edcdfcb8808ed5aee3c "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-scheduler_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
8f827dc0eade kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
5bebb169b8b3 k8s.gcr.io/kube-controller-manager@sha256:aa9b9dac085a65c47746fa8739cf70e9d7e9a356a836ad2ef073da0d7b136db2 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-controller-manager_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
4467d550824e k8s.gcr.io/kube-proxy@sha256:a63c81fe4d3e9575cc0a29c4866a2975b01a07c0f473ab2cf1e88ebf78739f80 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-proxy_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
0a5c23006e18 kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
3efa9ae55618 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
4e451bc007ac k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
7c5c301e034a k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_0
d88f075fa61f k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
69e8844e9c14 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
05e67c2e8f98 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
eee0a4d563c0 protokube:1.15.0 "/usr/bin/protokub..." 15 hours ago Up 15 hours hungry_shirley
Kubelet 正在尝试使用 API 服务器端点 https://127.0.0.1:443 注册主节点 us-east-1a。我相信这应该是其他两个主服务器中任何一个的 API 服务器端点。 Kubelet 使用 kubelet.conf 文件与 API 服务器通信以注册 node.Change 位于 /etc/kubernetes
的 kubelet.conf 文件中的 server
指向其中之一以下:
- us-east-1b 或 us-east-1c ex https://xx.xx.xx.xx:6443 主节点的
- 弹性 IP 或 public IP
- 当前主节点us-east-1b或us-east-1c的私有IP ex https://xx.xx.xx.xx:6443
- 当前主节点的 FQDN,如果你的主节点前面有一个负载平衡器 运行 kubernetes API 服务器。
更改后kubelet.conf重启kubelet。
编辑:由于您使用的是 etcd 管理器,您可以尝试描述的 Kubernetes 服务不可用/flannel 问题故障排除步骤 here
你能验证 etcd 服务是否 运行 并且在 us-east-1a 上在线吗?