有什么方法可以在 EKS 上使用自动缩放器耗尽 CloudWatch Container Insight 节点吗?
Is there any way to drain CloudWatch Container Insight nodes with autoscaler on EKS?
集群规格:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: mixedCluster
region: ap-southeast-1
nodeGroups:
- name: scale-spot
desiredCapacity: 1
maxSize: 10
instancesDistribution:
instanceTypes: ["t2.small", "t3.small"]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
iam:
withAddonPolicies:
autoScaler: true
labels:
nodegroup-type: stateless-workload
instance-type: spot
ssh:
publicKeyName: newkeypairbro
availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
问题:
CloudWatch pods 将在我扩展我的应用程序时自动为每个节点创建(业务 pods)。但是,当我决定将我的业务 pods 缩减为零时,我的集群自动缩放器并未耗尽或终止某些节点内的 cloudWatch 内容 (pods)。因此,这将在我的集群中创建一个虚拟节点。
根据上图,最后一个节点是虚拟节点,其中包含 cloudWatch pods:
预期结果:
如何在业务 pod 终止后优雅地(自动)耗尽 Amazon CloudWatch 节点?所以它不会创建虚拟节点?
这是我的自动缩放器配置:
Name: cluster-autoscaler
Namespace: kube-system
CreationTimestamp: Sun, 11 Apr 2021 20:44:28 +0700
Labels: app=cluster-autoscaler
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: false
deployment.kubernetes.io/revision: 2
Selector: app=cluster-autoscaler
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=cluster-autoscaler
Annotations: prometheus.io/port: 8085
prometheus.io/scrape: true
Service Account: cluster-autoscaler
Containers:
cluster-autoscaler:
Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mixedCluster
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Environment: <none>
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
Volumes:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-bundle.crt
HostPathType:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: cluster-autoscaler-54ccd944f6 (1/1 replicas created)
Events: <none>
我的尝试:
我试过用这个命令手动缩小它:
eksctl scale nodegroup --cluster=mixedCluster --nodes=1 --name=scale-spot
不行,而且returns:
[ℹ] scaling nodegroup stack "eksctl-mixedCluster-nodegroup-scale-spot" in cluster eksctl-mixedCluster-cluster
[ℹ] no change for nodegroup "scale-spot" in cluster "eksctl-mixedCluster-cluster": nodes-min 1, desired 1, nodes-max 10
没关系,我已经解决了我自己的问题。由于我的集群正在使用 t2.small 和 t3.small 实例,因此资源太少而无法触发自动缩放器缩减虚拟节点。我已经尝试使用更大的实例规格,t3a.medium 和 t3.medium,并且效果很好。
集群规格:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: mixedCluster
region: ap-southeast-1
nodeGroups:
- name: scale-spot
desiredCapacity: 1
maxSize: 10
instancesDistribution:
instanceTypes: ["t2.small", "t3.small"]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
iam:
withAddonPolicies:
autoScaler: true
labels:
nodegroup-type: stateless-workload
instance-type: spot
ssh:
publicKeyName: newkeypairbro
availabilityZones: ["ap-southeast-1a", "ap-southeast-1b"]
问题:
CloudWatch pods 将在我扩展我的应用程序时自动为每个节点创建(业务 pods)。但是,当我决定将我的业务 pods 缩减为零时,我的集群自动缩放器并未耗尽或终止某些节点内的 cloudWatch 内容 (pods)。因此,这将在我的集群中创建一个虚拟节点。
根据上图,最后一个节点是虚拟节点,其中包含 cloudWatch pods:
预期结果:
如何在业务 pod 终止后优雅地(自动)耗尽 Amazon CloudWatch 节点?所以它不会创建虚拟节点?
这是我的自动缩放器配置:
Name: cluster-autoscaler
Namespace: kube-system
CreationTimestamp: Sun, 11 Apr 2021 20:44:28 +0700
Labels: app=cluster-autoscaler
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: false
deployment.kubernetes.io/revision: 2
Selector: app=cluster-autoscaler
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=cluster-autoscaler
Annotations: prometheus.io/port: 8085
prometheus.io/scrape: true
Service Account: cluster-autoscaler
Containers:
cluster-autoscaler:
Image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.18.3
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--expander=least-waste
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mixedCluster
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Environment: <none>
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
Volumes:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-bundle.crt
HostPathType:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: cluster-autoscaler-54ccd944f6 (1/1 replicas created)
Events: <none>
我的尝试:
我试过用这个命令手动缩小它:
eksctl scale nodegroup --cluster=mixedCluster --nodes=1 --name=scale-spot
不行,而且returns:
[ℹ] scaling nodegroup stack "eksctl-mixedCluster-nodegroup-scale-spot" in cluster eksctl-mixedCluster-cluster
[ℹ] no change for nodegroup "scale-spot" in cluster "eksctl-mixedCluster-cluster": nodes-min 1, desired 1, nodes-max 10
没关系,我已经解决了我自己的问题。由于我的集群正在使用 t2.small 和 t3.small 实例,因此资源太少而无法触发自动缩放器缩减虚拟节点。我已经尝试使用更大的实例规格,t3a.medium 和 t3.medium,并且效果很好。