丢失了我的 openshift 控制台 ("Application is not available")

Question

我的 OpenShift 4.5.x 安装中的控制台 ui 神秘地停止工作。现在访问控制台 URL 会导致消息：

Application is not available

The application is currently not serving requests at this endpoint. It may not have been started or is still starting.

如果路由存在但找不到相应的服务或 pod，通常会看到这个，但在这种情况下，路由存在：

$ oc -n openshift-console get route
NAME        HOST/PORT                                             PATH   SERVICES    PORT    TERMINATION          WILDCARD
console     console-openshift-console.apps.example.com            console     https   reencrypt/Redirect   None
downloads   downloads-openshift-console.apps.example.com          downloads   http    edge/Redirect        None

服务存在：

$ oc -n openshift-console get service
NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
console     ClusterIP   172.30.36.70     <none>        443/TCP   57d
downloads   ClusterIP   172.30.190.186   <none>        80/TCP    57d

并且 pods 存在并且健康：

$ oc -n openshift-console get pods
NAME                       READY   STATUS    RESTARTS   AGE
console-76c8d7d755-gtfm8   0/1     Running   1          4m12s
console-76c8d7d755-mvf6n   0/1     Running   1          4m12s
downloads-9656c996-mmqhk   1/1     Running   0          53d
downloads-9656c996-z2khj   1/1     Running   0          53d

查看控制台 pods 的日志，联系 oauth 服务似乎有问题：

2021-01-04T22:05:48Z auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:05:58Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: EOF
2021-01-04T22:06:13Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:06:23Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: EOF
2021-01-04T22:06:38Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:06:53Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

但是 openshift-authentication 命名空间中的 pods 似乎很健康，并且没有在日志中报告任何错误。我应该在哪里寻找问题的根源？

openshift-authentication 命名空间中存在预期的路由和服务：

$ oc -n openshift-authentication get route
NAME              HOST/PORT                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
oauth-openshift   oauth-openshift.apps.example.com          oauth-openshift   6443   passthrough/Redirect   None

$ oc -n openshift-authentication get service
NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
oauth-openshift   ClusterIP   172.30.233.202   <none>        443/TCP   57d

$ oc -n openshift-authentication get route oauth-openshift -o json | jq .status
{
  "ingress": [
    {
      "conditions": [
        {
          "lastTransitionTime": "2020-11-08T19:48:08Z",
          "status": "True",
          "type": "Admitted"
        }
      ],
      "host": "oauth-openshift.apps.example.com",
      "routerCanonicalHostname": "apps.example.com",
      "routerName": "default",
      "wildcardPolicy": "None"
    }
  ]
}

Answer 1

原来是默认入口路由器的问题。没有明显的错误，但我能够通过重新启动路由器来解决问题：

oc -n openshift-ingress get pod -o json |
  jq -r '.items[].metadata.name' |
  xargs oc -n openshift-ingress delete pod

Answer 2

我在 OpenShift 3.11 上遇到了同样的问题

我刚刚用证书删除了 secret，openshift 将创建新的 secret，现在控制台可以工作了。

oc delete secret console-serving-cert -n openshift-console

丢失了我的 openshift 控制台 ("Application is not available")

Lost my openshift console ("Application is not available")

oauth

openshift