GKE NEG 入口总是 returns 502 错误网关
GKE NEG Ingress always returns 502 Bad Gateway
我在 Google Cloud Kubernetes Engine 集群上设置了 StatefulSet、带 NEG 的服务和 Ingress。
每个工作负载和网络对象都准备就绪且运行状况良好。创建入口并更新所有服务的 NEG 状态。为集群启用了 VPC-native (Alias-IP) 和 HTTP Load Balancer 选项。
但是当我尝试使用 Ingress 中指定的路径访问我的应用程序时,我总是收到 502(错误网关)错误。
这是我的配置(包括图像名称在内的名称都经过了编辑):
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress": true}'
labels:
app: myapp
name: myapp
spec:
ports:
- port: 80
protocol: TCP
targetPort: tcp
selector:
app: myapp
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: myapp
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
serviceName: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
livenessProbe:
httpGet:
path: /
port: tcp
scheme: HTTP
initialDelaySeconds: 60
image: myapp:8bebbaf
ports:
- containerPort: 1880
name: tcp
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /
port: tcp
scheme: HTTP
volumeMounts:
- mountPath: /data
name: data
securityContext:
fsGroup: 1000
terminationGracePeriodSeconds: 10
volumeClaimTemplates:
- metadata:
labels:
app: myapp
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: myapp-ingress
spec:
rules:
- http:
paths:
- path: /workflow
backend:
serviceName: myapp
servicePort: 80
它有什么问题,我该如何解决?
经过大量挖掘和测试,我终于找到了问题所在。此外,GKE NEG Ingress 似乎不是很稳定(实际上 NEG 处于测试阶段)并且并不总是符合 Kubernetes 规范。
有 an issue with GKE Ingress related to named ports in targetPort
field. The fix is implemented and available from 1.16.0-gke.20 cluster version (Release),截至今天(2020 年 2 月)在 Rapid Channel 下可用,但我没有测试该修复程序,因为我在进入来自该频道的版本时遇到其他问题。
如果您遇到同样的问题,基本上有两种选择:
在服务的 targetPort
字段中指定确切的端口号而不是端口名称。这是我示例中的固定服务配置文件:
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress": true}'
labels:
app: myapp
name: myapp
spec:
ports:
- port: 80
protocol: TCP
# !!!
# targetPort: tcp
targetPort: 1088
selector:
app: myapp
升级GKE集群至1.16.0-gke.20+版本(本人未测试)
我在 Google Cloud Kubernetes Engine 集群上设置了 StatefulSet、带 NEG 的服务和 Ingress。
每个工作负载和网络对象都准备就绪且运行状况良好。创建入口并更新所有服务的 NEG 状态。为集群启用了 VPC-native (Alias-IP) 和 HTTP Load Balancer 选项。
但是当我尝试使用 Ingress 中指定的路径访问我的应用程序时,我总是收到 502(错误网关)错误。
这是我的配置(包括图像名称在内的名称都经过了编辑):
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress": true}'
labels:
app: myapp
name: myapp
spec:
ports:
- port: 80
protocol: TCP
targetPort: tcp
selector:
app: myapp
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: myapp
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
serviceName: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
livenessProbe:
httpGet:
path: /
port: tcp
scheme: HTTP
initialDelaySeconds: 60
image: myapp:8bebbaf
ports:
- containerPort: 1880
name: tcp
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /
port: tcp
scheme: HTTP
volumeMounts:
- mountPath: /data
name: data
securityContext:
fsGroup: 1000
terminationGracePeriodSeconds: 10
volumeClaimTemplates:
- metadata:
labels:
app: myapp
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: myapp-ingress
spec:
rules:
- http:
paths:
- path: /workflow
backend:
serviceName: myapp
servicePort: 80
它有什么问题,我该如何解决?
经过大量挖掘和测试,我终于找到了问题所在。此外,GKE NEG Ingress 似乎不是很稳定(实际上 NEG 处于测试阶段)并且并不总是符合 Kubernetes 规范。
有 an issue with GKE Ingress related to named ports in targetPort
field. The fix is implemented and available from 1.16.0-gke.20 cluster version (Release),截至今天(2020 年 2 月)在 Rapid Channel 下可用,但我没有测试该修复程序,因为我在进入来自该频道的版本时遇到其他问题。
如果您遇到同样的问题,基本上有两种选择:
在服务的
targetPort
字段中指定确切的端口号而不是端口名称。这是我示例中的固定服务配置文件:apiVersion: v1 kind: Service metadata: annotations: cloud.google.com/neg: '{"ingress": true}' labels: app: myapp name: myapp spec: ports: - port: 80 protocol: TCP # !!! # targetPort: tcp targetPort: 1088 selector: app: myapp
升级GKE集群至1.16.0-gke.20+版本(本人未测试)