NEG 未附加到任何具有健康检查功能的 BackendService

NEG is not attached to any BackendService with health checking

当我使用滚动更新部署 GKE 上的应用程序 运行 时,我遇到了停机时间。

rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate

我查看了我的 pod 上的事件,最后一个事件是这个:

NEG is not attached to any Backend Service with health checking. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.

在我的 pod 上我有一个 livenessProbe 这样的:

livenessProbe:
      failureThreshold: 1
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

startupProbe:
          failureThreshold: 30
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

查看了我的 LB 日志,发现了这个:

{
httpRequest: {
latency: "0.002246s"
remoteIp: "myIP"
requestMethod: "GET"
requestSize: "37"
requestUrl: "https://www.myurl/"
responseSize: "447"
status: 502
userAgent: "curl/7.77.0"
}
insertId: "1mk"
jsonPayload: {3}
logName: "myproject/logs/requests"
receiveTimestamp: "2022-02-15T15:30:52.085256523Z"
resource: {
labels: {6}
type: "http_load_balancer"
}
severity: "WARNING"
spanId: "b75e2f583a0e9e25"
timestamp: "2022-02-15T15:30:51.270776Z"
trace: "myproject/traces/32c488f48a392ac42358be0f"
}

这是我要求的部署规范:

spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: app
      app.kubernetes.io/name: myname
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/config: 4920135cd08336150d3184cc1af
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: app
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: webapp-server
        app.kubernetes.io/part-of: webapp
        helm.sh/chart: myapp-1.0.0
    spec:
      containers:
      - env:
        - name: ENV VAR
          value: Hello
        envFrom:
        - configMapRef:
            name: myapp
        - secretRef:
            name: myapp-credentials
        image: imagelink
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 1
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: node
        ports:
        - containerPort: 3000
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 256Mi
        startupProbe:
          failureThreshold: 30
          httpGet:
            path: /healthz
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst

执行 rollingUpdate 时我可以更改什么来避免停机?

对于零停机时间的更新,您应该考虑使用多个 pod。
您还可以调整 maxSurge 和 maxUnavailable 值 (1)。
One-second 超时似乎有点低,请考虑提高这些值。
最后,您可以在 google docs.

中找到有关滚动更新的详尽指南

这通过添加以下内容起作用:

lifecycle:
   preStop:
      exec:
        command:
        - /bin/sh
        - -c
        - sleep 60

这基本上给了 pod 60 秒的时间来处理 sigterm 和旧请求,同时新的 pod 启动并处理新请求。