无法删除 Kubernetes 命名空间 - 删除终结器失败

Unable to delete Kubernetes namespace - removing finalizers fails

我在 Kubernetes 集群中有一个无法删除的命名空间。当我 运行 kubectl get ns traefik -o yaml 时,我得到以下信息:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    cattle.io/status: '{"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2021-06-11T20:28:59Z"},{"Type":"InitialRolesPopulated","Status":"True","Message":"","LastUpdateTime":"2021-06-11T20:29:00Z"}]}'
    field.cattle.io/projectId: c-5g2hz:p-bl9sf
    lifecycle.cattle.io/create.namespace-auth: "true"
  creationTimestamp: "2021-06-11T20:28:58Z"
  deletionTimestamp: "2021-07-04T07:21:20Z"
  labels:
    field.cattle.io/projectId: p-bl9sf
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/projectId: {}
        f:labels:
          .: {}
          f:field.cattle.io/projectId: {}
      f:status:
        f:phase: {}
    manager: agent
    operation: Update
    time: "2021-06-11T20:28:58Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cattle.io/status: {}
          f:lifecycle.cattle.io/create.namespace-auth: {}
    manager: rancher
    operation: Update
    time: "2021-06-11T20:28:58Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .: {}
          k:{"type":"NamespaceContentRemaining"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionContentFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionDiscoveryFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceDeletionGroupVersionParsingFailure"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"NamespaceFinalizersRemaining"}:
            .: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-07-04T07:21:26Z"
  name: traefik
  resourceVersion: "15400692"
  uid: 4b198956-bbd5-4bdb-9dc6-9d53feda91e4
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2021-07-04T07:21:25Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
      unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2021-07-04T07:21:26Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2021-07-04T07:21:26Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2021-07-04T07:21:26Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2021-07-04T07:21:26Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating

而当我运行kubectl delete ns traefik --v=10时,最后输出如下:

I0708 18:38:26.538676   31537 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubectl/v1.20.2 (linux/amd64) kubernetes/faecb19" 'http://127.0.0.1:44427/6614317c-41da-462b-8be3-c6cda2f0df24/api/v1/namespaces?fieldSelector=metadata.name%3Dtraefik&resourceVersion=17101173&watch=true'
I0708 18:38:27.013394   31537 round_trippers.go:445] GET http://127.0.0.1:44427/6614317c-41da-462b-8be3-c6cda2f0df24/api/v1/namespaces?fieldSelector=metadata.name%3Dtraefik&resourceVersion=17101173&watch=true 200 OK in 474 milliseconds
I0708 18:38:27.013421   31537 round_trippers.go:451] Response Headers:
I0708 18:38:27.013427   31537 round_trippers.go:454]     Access-Control-Allow-Origin: *
I0708 18:38:27.013450   31537 round_trippers.go:454]     Date: Thu, 08 Jul 2021 16:38:27 GMT
I0708 18:38:27.013453   31537 round_trippers.go:454]     Connection: keep-alive
I0708 18:38:27.013468   31537 request.go:708] Unexpected content type from the server: "": mime: no media type

我已经尝试按照 https://www.ibm.com/docs/en/cloud-private/3.2.0?topic=console-namespace-is-stuck-in-terminating-state 中的描述删除终结器,但几秒钟后我终于得到 EOF:

> curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/traefik/finalize
EOF

有人知道如何删除 traefik 命名空间吗?

将此作为社区 Wiki 发布,不发表评论,请随意编辑和扩展。

分析了有问题的命名空间的状态后,这部分是问题的主要原因:

message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
  complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
  unable to handle the request'

问题出在 kubernetes 中的 metric server。一旦 metric server 可用,命名空间就可以解锁并被删除。

类似问题已在 中解决。