oc cluster up 由于缺少 ~/.kube/config 无法启动 kube-apiserver

oc cluster up can't start kube-apiserver due to missing ~/.kube/config

我尝试在桌面上 运行 OKD (Ubuntu 18)。我遵循说明:https://opensource.com/article/18/11/local-okd-cluster-linux (similar).

  1. 我安装了Docker:
$ docker version
...
 Version:           19.03.12
  1. 不安全的注册表:
$ sudo cat /etc/docker/daemon.json
{
    "insecure-registries" : [ "172.30.0.0/16" ]
}
  1. 重新启动Docker守护进程:
$ docker info
...
Insecure Registries:
  172.30.0.0/16
  127.0.0.0/8
  1. 禁用防火墙:
$ sudo ufw status
Status: inactive
  1. 已下载 OKD 客户端工具:

ockubectl 来自 https://github.com/openshift/origin/releases/download/v3.11.0/openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz

$ ./oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
  1. 我尝试启动集群但失败了。
$ ./oc cluster up
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Creating shared mount directory on the remote host ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.11 ...
I1104 16:54:53.631254    6317 config.go:40] Running "create-master-config"
I1104 16:54:56.048019    6317 config.go:46] Running "create-node-config"
I1104 16:54:57.639381    6317 flags.go:30] Running "create-kubelet-flags"
I1104 16:54:58.559780    6317 run_kubelet.go:49] Running "start-kubelet"
I1104 16:54:58.862023    6317 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready ...

漫长的等待,之后:

E1104 16:59:58.864017    6317 run_self_hosted.go:571] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused ()
Error: timed out waiting for the condition

具有更高的日志记录级别(我知道我必须删除 openshift.local.clusterup 目录或在我想刷新 oc cluster up 时传递 --base-dir):

$ ./oc cluster up --loglevel=5
...
I1104 17:07:50.991364   14512 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready 
I1104 17:07:50.992053   14512 run_self_hosted.go:557] Server isn't healthy yet.  Waiting a little while. Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
I1104 17:07:51.992467   14512 run_self_hosted.go:557] Server isn't healthy yet.  Waiting a little while. Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
I1104 17:07:52.993484   14512 run_self_hosted.go:557] Server isn't healthy yet.  Waiting a little while. Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
...
I1104 17:08:10.992682   14512 run_self_hosted.go:557] Server isn't healthy yet.  Waiting a little while. Get https://127.0.0.1:8443/healthz?timeout=32s: net/http: TLS handshake timeout
...
error: unable to recognize "/namespace.yaml": Get https://127.0.0.1:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
...
The connection to the server 127.0.0.1:8443 was refused - did you specify the right host or port?
...
E1104 17:08:52.435348   14512 interface.go:34] Failed to install "openshift-service-cert-signer-operator": failed to install "openshift-service-cert-signer-operator": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11
E1104 17:08:53.087022   14512 interface.go:34] Failed to install "kube-dns": failed to install "kube-dns": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11
I1104 17:08:53.087047   14512 interface.go:41] Finished installing "kube-proxy" "kube-dns" "openshift-service-cert-signer-operator" "openshift-apiserver"
Error: [failed to install "kube-proxy": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11, failed to install "openshift-apiserver": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11, failed to install "openshift-service-cert-signer-operator": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11, failed to install "kube-dns": cannot create container using image openshift/origin-cli:v3.11; caused by: cannot create container using image openshift/origin-cli:v3.11]

我试着检查出了什么问题,我发现缺少配置存在问题。

$ ./oc cluster status
Error: invalid configuration: Missing or incomplete configuration info.  Please login or point to an existing, complete config file:

  1. Via the command-line flag --config
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

To view or setup config directly use the 'config' command.

我没有环境变量:

$ echo $KUBECONFIG

我没有 ~/.kube/config 文件:

$ cat ~/.kube/config
cat: /home/my-username/.kube/config: No such file or directory
$ ls ~/.kube/
ls: cannot access '/home/my-username/.kube/': No such file or directory

我知道 oc cluster up 应该创建 ~/.kube/config但就我而言,它不会创建。


由于缺少配置,甚至 kubectl 也不起作用(我认为它应该适用于二进制 https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-binary-with-curl-on-linux 的典型 kubectl 安装):

$ ./kubectl version --client
error: no configuration has been provided
$ ./kubectl config view
apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

我解决了这个问题。

我 Ubuntu 禁用了防火墙 (ufw)。但是 iptables 仍然有效。

命令 sudo iptables -L 向我显示了很多规则,这四个:

Chain KUBE-SERVICES (1 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             172.30.237.36        /* default/router:80-tcp has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.30.1.1           /* default/docker-registry:5000-tcp has no endpoints */ tcp dpt:5000 reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.30.237.36        /* default/router:443-tcp has no endpoints */ tcp dpt:https reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.30.237.36        /* default/router:1936-tcp has no endpoints */ tcp dpt:1936 reject-with icmp-port-unreachable

我不知道它们来自哪里(我可以猜到 - 下面)。我决定删除它。

sudo iptables -L --line-numbers
sudo iptables -D KUBE-SERVICES 1
sudo iptables -D KUBE-SERVICES 1
sudo iptables -D KUBE-SERVICES 1
sudo iptables -D KUBE-SERVICES 1

并重新启动系统(确保重新加载 iptables)。

./oc cluster up 之后开始并创建 ~/.kube/config

Server Information ...
OpenShift server started.

The server is accessible via web console at:
    https://127.0.0.1:8443

我猜规则可以来自 oc cluster up 当我 运行 它开始时没有添加 "insecure-registries" : [ "172.30.0.0/16" ]/etc/docker/daemon.json (我试图检查这是否是强制性的)