有什么方法可以在 EKS 上使用自动缩放器耗尽 CloudWatch Container Insight 节点吗?

Is there any way to drain CloudWatch Container Insight nodes with autoscaler on EKS?

集群规格:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mixedCluster
  region: ap-southeast-1

nodeGroups:
  - name: scale-spot
    desiredCapacity: 1
    maxSize: 10
    instancesDistribution:
      instanceTypes: ["t2.small", "t3.small"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
    availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
    iam:
      withAddonPolicies:
        autoScaler: true
    labels:
      nodegroup-type: stateless-workload
      instance-type: spot
    ssh:
      publicKeyName: newkeypairbro

availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]

问题:

CloudWatch pods 将在我扩展我的应用程序时自动为每个节点创建(业务 pods)。但是,当我决定将我的业务 pods 缩减为零时,我的集群自动缩放器并未耗尽或终止某些节点内的 cloudWatch 内容 (pods)。因此,这将在我的集群中创建一个虚拟节点。

根据上图,最后一个节点是虚拟节点,其中包含 cloudWatch pods:

预期结果:

如何在业务 pod 终止后优雅地(自动)耗尽 Amazon CloudWatch 节点?所以它不会创建虚拟节点?


这是我的自动缩放器配置:

Name:                   cluster-autoscaler
Namespace:              kube-system
CreationTimestamp:      Sun, 11 Apr 2021 20:44:28 +0700
Labels:                 app=cluster-autoscaler
Annotations:            cluster-autoscaler.kubernetes.io/safe-to-evict: false
                        deployment.kubernetes.io/revision: 2
Selector:               app=cluster-autoscaler
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=cluster-autoscaler
  Annotations:      prometheus.io/port: 8085
                    prometheus.io/scrape: true
  Service Account:  cluster-autoscaler
  Containers:
   cluster-autoscaler:
    Image:      k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3
    Port:       <none>
    Host Port:  <none>
    Command:
      ./cluster-autoscaler
      --v=4
      --stderrthreshold=info
      --cloud-provider=aws
      --skip-nodes-with-local-storage=false
      --expander=least-waste
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mixedCluster
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:        100m
      memory:     300Mi
    Environment:  <none>
    Mounts:
      /etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
  Volumes:
   ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs/ca-bundle.crt
    HostPathType:
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cluster-autoscaler-54ccd944f6 (1/1 replicas created)
Events:          <none>

我的尝试:

我试过用这个命令手动缩小它:

eksctl scale nodegroup --cluster=mixedCluster --nodes=1 --name=scale-spot

不行,而且returns:

[ℹ]  scaling nodegroup stack "eksctl-mixedCluster-nodegroup-scale-spot" in cluster eksctl-mixedCluster-cluster
[ℹ]  no change for nodegroup "scale-spot" in cluster "eksctl-mixedCluster-cluster": nodes-min 1, desired 1, nodes-max 10

没关系,我已经解决了我自己的问题。由于我的集群正在使用 t2.small 和 t3.small 实例,因此资源太少而无法触发自动缩放器缩减虚拟节点。我已经尝试使用更大的实例规格,t3a.medium 和 t3.medium,并且效果很好。