GCloud kubernetes 集群出现 1 Insufficient cpu 错误

Question

我在 Google 云上使用以下方法创建了一个 Kubernetes 集群：

gcloud container clusters create my-app-cluster --num-nodes=1

然后我部署了我的 3 个应用程序（后端、前端和一个 scraper）并创建了一个负载均衡器。我使用了以下配置文件：

apiVersion: apps/v1
kind: Deployment
metadata:
    name: my-app-deployment
    labels:
        app: my-app
spec:
    replicas: 1
    selector:
        matchLabels:
            app: my-app
    template:
        metadata:
            labels:
                app: my-app
        spec:
            containers:
              - name: my-app-server
                image: gcr.io/my-app/server
                ports:
                  - containerPort: 8009
                envFrom:
                  - secretRef:
                        name: my-app-production-secrets
              - name: my-app-scraper
                image: gcr.io/my-app/scraper
                ports:
                  - containerPort: 8109
                envFrom:
                  - secretRef:
                        name: my-app-production-secrets
              - name: my-app-frontend
                image: gcr.io/my-app/frontend
                ports:
                  - containerPort: 80
                envFrom:
                  - secretRef:
                        name: my-app-production-secrets

---

apiVersion: v1
kind: Service
metadata:
    name: my-app-lb-service
spec:
    type: LoadBalancer
    selector:
        app: my-app
    ports:
      - name: my-app-server-port
        protocol: TCP
        port: 8009
        targetPort: 8009
      - name: my-app-scraper-port
        protocol: TCP
        port: 8109
        targetPort: 8109
      - name: my-app-frontend-port
        protocol: TCP
        port: 80
        targetPort: 80

当输入 kubectl get pods 我得到：

NAME                                   READY     STATUS    RESTARTS   AGE
my-app-deployment-6b49c9b5c4-5zxw2   0/3       Pending   0          12h

当调查我 Google 云时，我看到 "Unschedulable" 状态 "insufficient cpu" pod 上的错误：

当转到“集群”页面中集群下的“节点”部分时，我看到请求了 681 个 mCPU，分配了 940 个 mCPU：

怎么了？为什么我的 pod 不启动？

Answer 1

每个容器都有一个默认的 CPU 请求（在 GKE 中我注意到它是 0.1 CPU 或 100m）。假设这些默认值，您在该 pod 中有三个容器，因此您请求另一个 0.3 CPU.

该节点有 0.68 CPU (680m) 被其他工作负载请求，并且该节点上的总限制（可分配）为 0.94 CPU (940m)。

如果您想查看哪些工作负载保留了 0.68 CPU，您需要检查节点上的 pods。在 GKE 上的页面中，您可以看到每个节点的资源分配和限制，如果您单击该节点，它将带您到提供此信息的页面。
在我的例子中，我可以看到 kube-dns 中的 2 pods 每个占 0.26 CPU，等等。这些是正确操作集群所需的系统 pods。您看到的内容还取决于您选择的附加服务，例如：HTTP 负载平衡（Ingress）、Kubernetes Dashboard 等。

对于超过 0.94 限制的节点，您的 pod 将需要 CPU 到 0.98 CPU，这就是您的 pod 无法启动的原因。

请注意，调度是基于每个工作负载的CPU请求的数量，而不是它实际使用的数量或限制。

您的选择：

关闭所有占用 CPU 您不需要的资源的附加服务。
向集群添加更多 CPU 资源。为此，您需要更改节点池以使用具有更多 CPU 的 VM，或者增加现有池中的节点数量。您可以在 GKE 控制台或通过 gcloud 命令行执行此操作。
在您的容器中明确请求 less CPU，这将覆盖默认值。

apiVersion: apps/v1
kind: Deployment
...
        spec:
            containers:
              - name: my-app-server
                image: gcr.io/my-app/server
                ...
                resources:
                  requests:
                     cpu: "50m"
              - name: my-app-scraper
                image: gcr.io/my-app/scraper
                ...
                resources:
                  requests:
                     cpu: "50m"
              - name: my-app-frontend
                image: gcr.io/my-app/frontend
                ...
                resources:
                  requests:
                     cpu: "50m"

GCloud kubernetes 集群出现 1 Insufficient cpu 错误

GCloud kubernetes cluster with 1 Insufficient cpu error

google-cloud-platform

gcloud

kubernetes

google-kubernetes-engine

kubectl