NEG 未附加到任何具有健康检查功能的 BackendService
NEG is not attached to any BackendService with health checking
当我使用滚动更新部署 GKE 上的应用程序 运行 时,我遇到了停机时间。
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
我查看了我的 pod 上的事件,最后一个事件是这个:
NEG is not attached to any Backend Service with health checking. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.
在我的 pod 上我有一个 livenessProbe
这样的:
livenessProbe:
failureThreshold: 1
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
startupProbe:
failureThreshold: 30
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
查看了我的 LB 日志,发现了这个:
{
httpRequest: {
latency: "0.002246s"
remoteIp: "myIP"
requestMethod: "GET"
requestSize: "37"
requestUrl: "https://www.myurl/"
responseSize: "447"
status: 502
userAgent: "curl/7.77.0"
}
insertId: "1mk"
jsonPayload: {3}
logName: "myproject/logs/requests"
receiveTimestamp: "2022-02-15T15:30:52.085256523Z"
resource: {
labels: {6}
type: "http_load_balancer"
}
severity: "WARNING"
spanId: "b75e2f583a0e9e25"
timestamp: "2022-02-15T15:30:51.270776Z"
trace: "myproject/traces/32c488f48a392ac42358be0f"
}
这是我要求的部署规范:
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: app
app.kubernetes.io/name: myname
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
checksum/config: 4920135cd08336150d3184cc1af
creationTimestamp: null
labels:
app.kubernetes.io/instance: app
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: webapp-server
app.kubernetes.io/part-of: webapp
helm.sh/chart: myapp-1.0.0
spec:
containers:
- env:
- name: ENV VAR
value: Hello
envFrom:
- configMapRef:
name: myapp
- secretRef:
name: myapp-credentials
image: imagelink
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 1
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: node
ports:
- containerPort: 3000
name: http
protocol: TCP
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
startupProbe:
failureThreshold: 30
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
执行 rollingUpdate
时我可以更改什么来避免停机?
对于零停机时间的更新,您应该考虑使用多个 pod。
您还可以调整 maxSurge 和 maxUnavailable 值 (1)。
One-second 超时似乎有点低,请考虑提高这些值。
最后,您可以在 google docs.
中找到有关滚动更新的详尽指南
这通过添加以下内容起作用:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 60
这基本上给了 pod 60 秒的时间来处理 sigterm 和旧请求,同时新的 pod 启动并处理新请求。
当我使用滚动更新部署 GKE 上的应用程序 运行 时,我遇到了停机时间。
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
我查看了我的 pod 上的事件,最后一个事件是这个:
NEG is not attached to any Backend Service with health checking. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.
在我的 pod 上我有一个 livenessProbe
这样的:
livenessProbe:
failureThreshold: 1
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
startupProbe:
failureThreshold: 30
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
查看了我的 LB 日志,发现了这个:
{
httpRequest: {
latency: "0.002246s"
remoteIp: "myIP"
requestMethod: "GET"
requestSize: "37"
requestUrl: "https://www.myurl/"
responseSize: "447"
status: 502
userAgent: "curl/7.77.0"
}
insertId: "1mk"
jsonPayload: {3}
logName: "myproject/logs/requests"
receiveTimestamp: "2022-02-15T15:30:52.085256523Z"
resource: {
labels: {6}
type: "http_load_balancer"
}
severity: "WARNING"
spanId: "b75e2f583a0e9e25"
timestamp: "2022-02-15T15:30:51.270776Z"
trace: "myproject/traces/32c488f48a392ac42358be0f"
}
这是我要求的部署规范:
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: app
app.kubernetes.io/name: myname
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
checksum/config: 4920135cd08336150d3184cc1af
creationTimestamp: null
labels:
app.kubernetes.io/instance: app
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: webapp-server
app.kubernetes.io/part-of: webapp
helm.sh/chart: myapp-1.0.0
spec:
containers:
- env:
- name: ENV VAR
value: Hello
envFrom:
- configMapRef:
name: myapp
- secretRef:
name: myapp-credentials
image: imagelink
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 1
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: node
ports:
- containerPort: 3000
name: http
protocol: TCP
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
startupProbe:
failureThreshold: 30
httpGet:
path: /healthz
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
执行 rollingUpdate
时我可以更改什么来避免停机?
对于零停机时间的更新,您应该考虑使用多个 pod。
您还可以调整 maxSurge 和 maxUnavailable 值 (1)。
One-second 超时似乎有点低,请考虑提高这些值。
最后,您可以在 google docs.
这通过添加以下内容起作用:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 60
这基本上给了 pod 60 秒的时间来处理 sigterm 和旧请求,同时新的 pod 启动并处理新请求。