AKS 升级到 v1.22 后 Nginx-ingress-controller 启动失败

Nginx-ingress-controller fails to start after AKS upgrade to v1.22

我们将 kubernetes 集群从 v1.21 升级到 v1.22。执行此操作后,我们发现我们的 nginx-ingress-controller 部署的 pods 无法启动,并显示以下错误消息: pkg/mod/k8s.io/client-go@v0.18.5/tools/cache/reflector.go:125: Failed to list *v1beta1.Ingress: the server could not find the requested resource

我们发现此问题已在此处跟踪:https://github.com/bitnami/charts/issues/7264

因为 azure 不允许将集群降级回 1.21,您能帮我们修复 nginx-ingress-controller 部署吗?由于我们对 helm.

不是很熟悉,您能否具体说明应该做什么以及从何处(本地计算机或 azure cli 等)

这是我们的部署当前 yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx-ingress-controller
  namespace: ingress
  uid: 575c7699-1fd5-413e-a81d-b183f8822324
  resourceVersion: '166482672'
  generation: 16
  creationTimestamp: '2020-10-10T10:20:07Z'
  labels:
    app: nginx-ingress
    app.kubernetes.io/component: controller
    app.kubernetes.io/managed-by: Helm
    chart: nginx-ingress-1.41.1
    heritage: Helm
    release: nginx-ingress
  annotations:
    deployment.kubernetes.io/revision: '2'
    meta.helm.sh/release-name: nginx-ingress
    meta.helm.sh/release-namespace: ingress
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:replicas: {}
      subresource: scale
    - manager: Go-http-client
      operation: Update
      apiVersion: apps/v1
      time: '2020-10-10T10:20:07Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app: {}
            f:app.kubernetes.io/component: {}
            f:app.kubernetes.io/managed-by: {}
            f:chart: {}
            f:heritage: {}
            f:release: {}
        f:spec:
          f:progressDeadlineSeconds: {}
          f:revisionHistoryLimit: {}
          f:selector: {}
          f:strategy:
            f:rollingUpdate:
              .: {}
              f:maxSurge: {}
              f:maxUnavailable: {}
            f:type: {}
          f:template:
            f:metadata:
              f:labels:
                .: {}
                f:app: {}
                f:app.kubernetes.io/component: {}
                f:component: {}
                f:release: {}
            f:spec:
              f:containers:
                k:{"name":"nginx-ingress-controller"}:
                  .: {}
                  f:args: {}
                  f:env:
                    .: {}
                    k:{"name":"POD_NAME"}:
                      .: {}
                      f:name: {}
                      f:valueFrom:
                        .: {}
                        f:fieldRef: {}
                    k:{"name":"POD_NAMESPACE"}:
                      .: {}
                      f:name: {}
                      f:valueFrom:
                        .: {}
                        f:fieldRef: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:livenessProbe:
                    .: {}
                    f:failureThreshold: {}
                    f:httpGet:
                      .: {}
                      f:path: {}
                      f:port: {}
                      f:scheme: {}
                    f:initialDelaySeconds: {}
                    f:periodSeconds: {}
                    f:successThreshold: {}
                    f:timeoutSeconds: {}
                  f:name: {}
                  f:ports:
                    .: {}
                    k:{"containerPort":80,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:name: {}
                      f:protocol: {}
                    k:{"containerPort":443,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:name: {}
                      f:protocol: {}
                  f:readinessProbe:
                    .: {}
                    f:failureThreshold: {}
                    f:httpGet:
                      .: {}
                      f:path: {}
                      f:port: {}
                      f:scheme: {}
                    f:initialDelaySeconds: {}
                    f:periodSeconds: {}
                    f:successThreshold: {}
                    f:timeoutSeconds: {}
                  f:resources:
                    .: {}
                    f:limits: {}
                    f:requests: {}
                  f:securityContext:
                    .: {}
                    f:allowPrivilegeEscalation: {}
                    f:capabilities:
                      .: {}
                      f:add: {}
                      f:drop: {}
                    f:runAsUser: {}
                  f:terminationMessagePath: {}
                  f:terminationMessagePolicy: {}
              f:dnsPolicy: {}
              f:restartPolicy: {}
              f:schedulerName: {}
              f:securityContext: {}
              f:serviceAccount: {}
              f:serviceAccountName: {}
              f:terminationGracePeriodSeconds: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-24T01:23:22Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:type: {}
    - manager: Mozilla
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-28T23:18:41Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:template:
            f:spec:
              f:containers:
                k:{"name":"nginx-ingress-controller"}:
                  f:resources:
                    f:limits:
                      f:cpu: {}
                      f:memory: {}
                    f:requests:
                      f:cpu: {}
                      f:memory: {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: apps/v1
      time: '2022-01-28T23:29:49Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:conditions:
            k:{"type":"Available"}:
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
            k:{"type":"Progressing"}:
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
          f:observedGeneration: {}
          f:replicas: {}
          f:unavailableReplicas: {}
          f:updatedReplicas: {}
      subresource: status
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-ingress
      app.kubernetes.io/component: controller
      release: nginx-ingress
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx-ingress
        app.kubernetes.io/component: controller
        component: controller
        release: nginx-ingress
    spec:
      containers:
        - name: nginx-ingress-controller
          image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
          args:
            - /nginx-ingress-controller
            - '--default-backend-service=ingress/nginx-ingress-default-backend'
            - '--election-id=ingress-controller-leader'
            - '--ingress-class=nginx'
            - '--configmap=ingress/nginx-ingress-controller'
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          resources:
            limits:
              cpu: 300m
              memory: 512Mi
            requests:
              cpu: 200m
              memory: 256Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            runAsUser: 101
            allowPrivilegeEscalation: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 60
      dnsPolicy: ClusterFirst
      serviceAccountName: nginx-ingress
      serviceAccount: nginx-ingress
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 16
  replicas: 3
  updatedReplicas: 2
  unavailableReplicas: 3
  conditions:
    - type: Available
      status: 'False'
      lastUpdateTime: '2022-01-28T22:58:07Z'
      lastTransitionTime: '2022-01-28T22:58:07Z'
      reason: MinimumReplicasUnavailable
      message: Deployment does not have minimum availability.
    - type: Progressing
      status: 'False'
      lastUpdateTime: '2022-01-28T23:29:49Z'
      lastTransitionTime: '2022-01-28T23:29:49Z'
      reason: ProgressDeadlineExceeded
      message: >-
        ReplicaSet "nginx-ingress-controller-59d9f94677" has timed out
        progressing.

仅 NGINX Ingress Controller 1.0.0 及更高版本支持 Kubernetes 1.22 = https://github.com/kubernetes/ingress-nginx#support-versions-table

您需要在 Chart.yaml 中将您的 nginx-ingress-controller Bitnami Helm Chart 升级到版本 9.0.0。然后运行一个helm upgrade nginx-ingress-controller bitnami/nginx-ingress-controller.

您还应该定期特别更新您的入口控制器,因为 v0.34.1 版本非常旧,因为入口通常是从外部指定到您的集群的唯一入口。

@Philip Welz 的回答当然是正确的。由于在 Kubernetes v1.22 中删除了 v1beta1 Ingress API 版本,因此有必要升级入口控制器。但这不是我们面临的唯一问题,所以我决定制作一个“非常非常简短”的指南,说明我们如何最终得到一个健康的 运行 集群(5 天后),这样它可能会拯救其他人奋斗。

1。正在升级 YAML 文件中的 nginx-ingress-controller 版本。

这里我们只是把yaml文件中的版本改成了:

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1

image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v1.1.1

此操作后,在 v1.1.1 中生成了一个新的 pod。它开始很好,并且 运行 健康。不幸的是,这并没有让我们的微服务重新上线。现在我知道这可能是因为必须对现有的入口 yaml 文件进行一些更改,以使其与新版本的入口控制器兼容。所以现在直接进入第2步(下面两个headers)。

暂时不要执行此步骤,只有在第 2 步对您失败时才执行:重新安装 nginx-ingress-controller

我们决定在这种情况下,我们将按照微软的官方文档从头开始重新安装控制器:https://docs.microsoft.com/en-us/azure/aks/ingress-basic?tabs=azure-cli。请注意,这可能会更改入口控制器的外部 IP 地址。在我们的案例中,最简单的方法是删除整个 ingress 命名空间:

kubectl delete namespace ingress

不幸的是,这并没有删除入口 class,因此需要额外的:

kubectl delete ingressclass nginx --all-namespaces

然后安装新控制器:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --create-namespace --namespace ingress 

如果您在步骤 1 中升级后重新安装 nginx-ingress-controller 或更改了 IP 地址:更新您的网络安全组、负载平衡器和域 DNS

在您的 AKS 资源组中应该有 Network security group 类型的资源。它包含入站和出站安全规则(我知道它用作防火墙)。应该有一个由 Kubernetes 自动管理的默认网络安全组,IP 地址应该在那里自动刷新。

不幸的是,我们还有一个额外的自定义。我们不得不在那里手动更新规则。

同一个资源组中应该有一个Load balancer类型的资源。在 Frontend IP configuration 选项卡中仔细检查 IP 地址是否反映了您的新 IP 地址。作为奖励,您可以在 Backend pools 选项卡中仔细检查那里的地址是否与您的内部节点 IP 匹配。

最后别忘了调整您的域 DNS 记录。

2。升级您的入口 yaml 配置文件以匹配语法更改

我们花了一些时间来确定一个工作模板,但实际上从上面提到的 Microsoft 教程中安装 helloworld 应用程序对我们帮助很大。我们从这里开始:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: hello-world-ingress
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: 'false'
    nginx.ingress.kubernetes.io/use-regex: 'true'
  rules:
    - http:
        paths:
          - path: /hello-world-one(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: aks-helloworld-one
                port:
                  number: 80

在逐步引入更改后,我们终于做到了下面的内容。但我很确定问题是我们缺少 nginx.ingress.kubernetes.io/use-regex: 'true' 条目:

kind: Ingress
apiVersion: networking.k8s.io/v1
metadata:
  name: example-api
  namespace: services
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Forwarded-By: example-api";
    nginx.ingress.kubernetes.io/rewrite-target: /example-api
    nginx.ingress.kubernetes.io/ssl-redirect: 'true'
    nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
  tls:
    - hosts:
        - services.example.com
      secretName: tls-secret
  rules:
    - host: services.example.com
      http:
        paths:
          - path: /example-api
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-api
                port:
                  number: 80

以防万一有人想安装,出于测试目的,helloworld 应用程序然后 yamls 如下所示:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aks-helloworld-one  
spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-helloworld-one
  template:
    metadata:
      labels:
        app: aks-helloworld-one
    spec:
      containers:
      - name: aks-helloworld-one
        image: mcr.microsoft.com/azuredocs/aks-helloworld:v1
        ports:
        - containerPort: 80
        env:
        - name: TITLE
          value: "Welcome to Azure Kubernetes Service (AKS)"
---
apiVersion: v1
kind: Service
metadata:
  name: aks-helloworld-one  
spec:
  type: ClusterIP
  ports:
  - port: 80
  selector:
    app: aks-helloworld-one

3。处理其他崩溃的应用程序...

另一个在我们集群中崩溃的应用程序是 cert-manager。这是 1.0.1 版,所以,首先,我们将其升级到 1.1.1 版:

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --namespace cert-manager --version 1.1 cert-manager jetstack/cert-manager

这创造了一个全新的健康豆荚。我们很高兴并决定继续使用 v1.1,因为我们有点担心升级到更高版本时必须采取的额外措施(查看本页底部 https://cert-manager.io/docs/installation/upgrading/)。

集群现在终于修复了。是吧?

4。 ...但一定要检查兼容性图表!

嗯.. 现在我们知道 cert-manager 仅从 1.5 版开始与 Kubernetes v1.22 兼容。我们很不走运,就在那天晚上,我们的 SSL 证书从到期日起超过了 30 天的门槛,所以 cert-manager 决定续订证书!操作失败,cert-manager 崩溃。 Kubernetes 回退到“Kubernetes 假证书”。由于证书无效,浏览器终止了流量,网页再次关闭。 修复是升级到 1.5 并同时升级 CRD:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.5.4/cert-manager.crds.yaml
helm upgrade --namespace cert-manager --version 1.5 cert-manager jetstack/cert-manager

在此之后,cert-manager 的新实例成功刷新了我们的证书。群集再次保存。

如果你需要强制续订,你可以看看这个问题:https://github.com/jetstack/cert-manager/issues/2641

@ajcann 建议在证书中添加 renewBefore 属性:

kubectl get certs --no-headers=true | awk '{print }' | xargs -n 1 kubectl patch certificate --patch '
- op: replace
  path: /spec/renewBefore
  value: 1440h
' --type=json

然后等待证书更新,然后删除 属性:

kubectl get certs --no-headers=true | awk '{print }' | xargs -n 1 kubectl patch certificate --patch '
- op: remove
  path: /spec/renewBefore
' --type=json