使用 Autoscaler 在 GCP 上无法调度 Kubernetes pods
Unschedulable Kubernetes pods on GCP using Autoscaler
我有一个 Kubernetes 集群,其中 pods 使用 Autopilot 可自动扩展。突然他们停止自动缩放,我是 Kubernetes 的新手,我不知道该做什么或应该在控制台中显示什么以寻求帮助。
pods 自动处于不可调度状态,并且在集群内部将他的状态设为 Pending 而不是 运行 并且不允许我进入或交互。
我也无法在 GCP 控制台删除或停止它们。没有关于内存或 CPU 不足的问题,因为上面的服务器 运行 不多。
在我遇到这个问题之前,集群按预期工作。
Namespace: default
Priority: 0
Node: <none>
Labels: app=odoo-service
pod-template-hash=5bd88899d7
Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/odoo-cluster-dev-5bd88899d7
Containers:
odoo-service:
Image: us-central1-docker.pkg.dev/adams-dev/adams-odoo/odoo-service:v58
Port: <none>
Host Port: <none>
Limits:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Requests:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Environment:
ODOO_HTTP_SOCKET_TIMEOUT: 30
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
cloud-sql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.17
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=adams-dev:us-central1:odoo-test=tcp:5432
Limits:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Requests:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-zqh5r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x248 over 3h53m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Normal NotTriggerScaleUp 8m1s (x261 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 3m (x1646 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
Warning FailedScheduling 20s (x168 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x250 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 8m2s (x300 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Warning FailedScheduling 5m21s (x164 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
Normal NotTriggerScaleUp 3m1s (x1616 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
我不知道我能调试或修复多少。
Pods 无法在任何节点上安排,因为 none 个节点有 cpu 个可用。
集群自动缩放器尝试扩展,但在扩展尝试失败后退出,这表明扩展作为节点池一部分的托管实例组可能存在问题。
集群自动缩放器尝试扩展,但由于达到配额限制,无法添加新节点。
您看不到计入您的配额的 Autopilot GKE 虚拟机。
尝试在其他地区创建自动驾驶集群。如果自动驾驶仪集群不再满足您的需求,请选择标准集群。
我有一个 Kubernetes 集群,其中 pods 使用 Autopilot 可自动扩展。突然他们停止自动缩放,我是 Kubernetes 的新手,我不知道该做什么或应该在控制台中显示什么以寻求帮助。
pods 自动处于不可调度状态,并且在集群内部将他的状态设为 Pending 而不是 运行 并且不允许我进入或交互。
我也无法在 GCP 控制台删除或停止它们。没有关于内存或 CPU 不足的问题,因为上面的服务器 运行 不多。
在我遇到这个问题之前,集群按预期工作。
Namespace: default
Priority: 0
Node: <none>
Labels: app=odoo-service
pod-template-hash=5bd88899d7
Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/odoo-cluster-dev-5bd88899d7
Containers:
odoo-service:
Image: us-central1-docker.pkg.dev/adams-dev/adams-odoo/odoo-service:v58
Port: <none>
Host Port: <none>
Limits:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Requests:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Environment:
ODOO_HTTP_SOCKET_TIMEOUT: 30
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
cloud-sql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.17
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=adams-dev:us-central1:odoo-test=tcp:5432
Limits:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Requests:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-zqh5r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x248 over 3h53m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Normal NotTriggerScaleUp 8m1s (x261 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 3m (x1646 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
Warning FailedScheduling 20s (x168 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x250 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 8m2s (x300 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Warning FailedScheduling 5m21s (x164 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
Normal NotTriggerScaleUp 3m1s (x1616 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
我不知道我能调试或修复多少。
Pods 无法在任何节点上安排,因为 none 个节点有 cpu 个可用。
集群自动缩放器尝试扩展,但在扩展尝试失败后退出,这表明扩展作为节点池一部分的托管实例组可能存在问题。
集群自动缩放器尝试扩展,但由于达到配额限制,无法添加新节点。
您看不到计入您的配额的 Autopilot GKE 虚拟机。
尝试在其他地区创建自动驾驶集群。如果自动驾驶仪集群不再满足您的需求,请选择标准集群。