Rancher Kubernetes 仪表板 - 服务不可用

Rancher Kubernetes Dashboard - Service Unavailable

总的来说,我是 Rancher 和容器的新手。使用 Rancher 设置 Kubernetes 集群时,我在访问 Kubernetes 仪表板时遇到问题。

rancher/server: 1.6.6

Single node Rancher server + External MySQL + 3 agent nodes

Infrastructure Stack versions:
healthcheck: v0.3.1
ipsec: net:v0.11.5
network-services: metadata:v0.9.2 / network-manager:v0.7.7
scheduler: k8s:v1.7.2-rancher5
kubernetes (if applicable): kubernetes-agent:v0.6.3


# docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.03.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.34-rancher
Operating System: RancherOS v1.0.3
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798 GiB
Name: ch7radod1
ID: IUNS:4WT2:Y3TV:2RI4:FZQO:4HYD:YSNN:6DPT:HMQ6:S2SI:OPGH:TX4Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.ch.abc.net:8080
Https Proxy: http://proxy.ch.abc.net:8080
No Proxy: localhost,.xyz.net,abc.net
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

访问UIURLhttp://10.216.30.10/r/projects/1a6633/kubernetes-dashboard:9090/#显示“服务不可用”

如果我使用 UI 中的 CLI 部分,我会得到以下信息:

> kubectl get nodes
NAME              STATUS    AGE       VERSION
ch7radod3       Ready     1d        v1.7.2
ch7radod4       Ready     5d        v1.7.2
ch7radod1       Ready     1d        v1.7.2

> kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY     STATUS              RESTARTS   AGE
kube-system   heapster-4285517626-4njc2              0/1       ContainerCreating   0          5d
kube-system   kube-dns-3942128195-ft56n              0/3       ContainerCreating   0          19d
kube-system   kube-dns-646531078-z5lzs               0/3       ContainerCreating   0          5d
kube-system   kubernetes-dashboard-716739405-lpj38   0/1       ContainerCreating   0          5d
kube-system   monitoring-grafana-3552275057-qn0zf    0/1       ContainerCreating   0          5d
kube-system   monitoring-influxdb-4110454889-79pvk   0/1       ContainerCreating   0          5d
kube-system   tiller-deploy-737598192-f9gcl          0/1       ContainerCreating   0          5d

设置使用私有注册表 (Artifactory)。我检查了 Artifactory,我可以看到一些与 Docker 相关的图像。我正在浏览 private registry section and i also saw this 文件。如果需要此文件,我应该将它保存在哪里,以便 Rancher 可以获取它并配置 Kubernetes 仪表板?

更新:

$ sudo ros engine switch docker-1.12.6
> ERRO[0031] Failed to load https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Get https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Proxy Authentication Required
> FATA[0031] docker-1.12.6 is not a valid engine

我想可能是因为 NGINX,所以我停止了 NGINX 容器,但我仍然收到上述错误。早些时候我在这个 Rancher 服务器上尝试过相同的命令并且它曾经工作正常。它在代理节点上运行良好,尽管它们已经配置了 1.12.6。

更新 2:

> kubectl -n kube-system get po
NAME                                 READY STATUS            RESTARTS AGE
heapster-4285517626-4njc2            1/1   Running           0        12d
kube-dns-2588877561-26993            0/3   ImagePullBackOff  0        5h
kube-dns-646531078-z5lzs             0/3   ContainerCreating 0        12d
kubernetes-dashboard-716739405-zq3s9 0/1   CrashLoopBackOff  67       5h
monitoring-grafana-3552275057-qn0zf  1/1   Running           0        12d
monitoring-influxdb-4110454889-79pvk 1/1   Running           0        12d
tiller-deploy-737598192-f9gcl        0/1   CrashLoopBackOff  72       12d

None 您的 pods 运行,您需要先解决该问题。尝试重新启动整个集群并在 运行 状态下查看以上所有 pods。

基于@ivan.sim的, i posted 'UPDATE 2'. This started me finally to look in the right direction. I then started looking for CrashLoopBackOff error online and came across this link并尝试了以下命令(使用Rancher控制台的CLI选项),这实际上与@[=24非常相似=] 上面的建议,但这对 node 有帮助,其中仪表板进程是 运行:

> kubectl get pods -a -o wide --all-namespaces
NAMESPACE     NAME                                   READY  STATUS              RESTARTS   AGE  IP                  NODE
kube-system   heapster-4285517626-4njc2              1/1    Running             0          12d  10.42.224.157       radod4
kube-system   kube-dns-2588877561-26993              0/3    ImagePullBackOff    0          5h   <none>              radod1
kube-system   kube-dns-646531078-z5lzs               0/3    ContainerCreating   0          12d  <none>              radod4
kube-system   kubernetes-dashboard-716739405-zq3s9   0/1    Error               70         5h   10.42.218.11        radod1
kube-system   monitoring-grafana-3552275057-qn0zf    1/1    Running             0          12d  10.42.202.44        radod4
kube-system   monitoring-influxdb-4110454889-79pvk   1/1    Running             0          12d  10.42.111.171       radod4
kube-system   tiller-deploy-737598192-f9gcl          0/1    CrashLoopBackOff    76         12d  10.42.213.24        radod4

然后我去了进程正在执行的主机并尝试了以下命令:

[rancher@radod1 ~]$
[rancher@radod1 ~]$ docker ps -a | grep dash
282334b0ed38  gcr.io/google_containers/kubernetes-dashboard-amd64@sha256:b537ce8988510607e95b8d40ac9824523b1f9029e6f9f90e9fccc663c355cf5d  "/dashboard --insecur"   About a minute ago   Exited (1) 55 seconds ago   k8s_kubernetes-dashboard_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_72
99836d7824fd  gcr.io/google_containers/pause-amd64:3.0                                                                                     "/pause"                 5 hours ago          Up 5 hours                  k8s_POD_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_1
[rancher@radod1 ~]$
[rancher@radod1 ~]$
[rancher@radod1 ~]$ docker logs 282334b0ed38
Using HTTP port: 8443
Creating API server client for https://10.43.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

出现上述错误后,我再次上网搜索并尝试了一些东西。最后,this link 帮了忙。我在所有代理节点上执行了以下命令后,Kubernetes dashboard 终于开始工作了!

docker volume rm etcd
rm -rf /var/etcd/backups/*