Kubernetes HPA 无法扩展
Kubernetes HPA doesn't scale up
今天这很奇怪,我使用了 AWS EKS 集群,它在昨天和今天早上对我的 HPA 运行良好。从下午开始,什么都没有,我的HPA突然不行了!!
这是我的 HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my_hpa_name
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my_deployment_name
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: my_metrics # MUST match the metrics on custom_metrics API
target:
type: AverageValue
averageValue: 5
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # window to consider waiting while scaling Up. default is 0s if empty.
scaleDown:
stabilizationWindowSeconds: 300 # window to consider waiting while scaling down. default is 300s if empty.
而且,当我开始测试时,我尝试了很多次,但都失败了:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
xxxx-hpa Deployment/xxxx-deployment <unknown>/5 1 10 0 5s
xxxx-hpa Deployment/xxxx-deployment 0/5 1 10 1 16s
xxxx-hpa Deployment/xxxx-deployment 10/5 1 10 1 3m4s
xxxx-hpa Deployment/xxxx-deployment 9/5 1 10 1 7m38s
xxxx-hpa Deployment/xxxx-deployment 10/5 1 10 1 8m9s
可以看到上面的副本一直没有增加!
当我描述我的 HPA 时,它说没有关于放大的事件,但当前值 > 我的目标但从未放大!!!
Name: hpa_name
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa_name","name...
CreationTimestamp: Thu, 04 Mar 2021 20:28:40 -0800
Reference: Deployment/my_deployment
Metrics: ( current / target )
"plex_queue_size" on pods: 10 / 5
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric my_metrics
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
这有什么问题?
会不会是EKS集群出了问题???
编辑:
- 查看官方文档:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1
我想即使我的指标是 6/5,它仍然会扩大,因为它大于 1.0
- 我清楚地看到我的 HPA 之前有效,这是 2 天前它有效的一些证据:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa Deployment/my-deployment 0/5 1 10 1 26s
my-hpa Deployment/my-deployment 0/5 1 10 1 46s
my-hpa Deployment/my-deployment 8/5 1 10 1 6m21s
my-hpa Deployment/my-deployment 8/5 1 10 2 6m36s
my-hpa Deployment/my-deployment 8/5 1 10 2 6m52s
my-hpa Deployment/my-deployment 8/5 1 10 4 7m7s
my-hpa Deployment/my-deployment 7/5 1 10 4 7m38s
my-hpa Deployment/my-deployment 6750m/5 1 10 6 7m55s
但是现在,它不起作用了。我已经尝试为其他指标启动新的 HPA,它有效。就这一个。奇怪...
新编辑:
由于 EKS 集群,这是可能的,正如我所看到的:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-27-177-146.us-west-2.compute.internal Ready <none> 14h v1.18.9-eks-d1db3c
ip-172-27-183-31.us-west-2.compute.internal Ready,SchedulingDisabled <none> 15h v1.18.9-eks-d1db3c
SchedulingDisabled 是否意味着群集不足以容纳新的 pods?
想到的一件事是您的 metrics-server might not be running correctly. Without data from the metrics-server,水平 Pod 自动缩放将不起作用。
想通了。这是 EKS 集群问题。我有最多 2 个按需节点和最多 2 个现场节点的资源限制。需要增加集群节点。
今天这很奇怪,我使用了 AWS EKS 集群,它在昨天和今天早上对我的 HPA 运行良好。从下午开始,什么都没有,我的HPA突然不行了!!
这是我的 HPA:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my_hpa_name
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my_deployment_name
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: my_metrics # MUST match the metrics on custom_metrics API
target:
type: AverageValue
averageValue: 5
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # window to consider waiting while scaling Up. default is 0s if empty.
scaleDown:
stabilizationWindowSeconds: 300 # window to consider waiting while scaling down. default is 300s if empty.
而且,当我开始测试时,我尝试了很多次,但都失败了:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
xxxx-hpa Deployment/xxxx-deployment <unknown>/5 1 10 0 5s
xxxx-hpa Deployment/xxxx-deployment 0/5 1 10 1 16s
xxxx-hpa Deployment/xxxx-deployment 10/5 1 10 1 3m4s
xxxx-hpa Deployment/xxxx-deployment 9/5 1 10 1 7m38s
xxxx-hpa Deployment/xxxx-deployment 10/5 1 10 1 8m9s
可以看到上面的副本一直没有增加!
当我描述我的 HPA 时,它说没有关于放大的事件,但当前值 > 我的目标但从未放大!!!
Name: hpa_name
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa_name","name...
CreationTimestamp: Thu, 04 Mar 2021 20:28:40 -0800
Reference: Deployment/my_deployment
Metrics: ( current / target )
"plex_queue_size" on pods: 10 / 5
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric my_metrics
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
这有什么问题?
会不会是EKS集群出了问题???
编辑:
- 查看官方文档: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1
我想即使我的指标是 6/5,它仍然会扩大,因为它大于 1.0
- 我清楚地看到我的 HPA 之前有效,这是 2 天前它有效的一些证据:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-hpa Deployment/my-deployment 0/5 1 10 1 26s
my-hpa Deployment/my-deployment 0/5 1 10 1 46s
my-hpa Deployment/my-deployment 8/5 1 10 1 6m21s
my-hpa Deployment/my-deployment 8/5 1 10 2 6m36s
my-hpa Deployment/my-deployment 8/5 1 10 2 6m52s
my-hpa Deployment/my-deployment 8/5 1 10 4 7m7s
my-hpa Deployment/my-deployment 7/5 1 10 4 7m38s
my-hpa Deployment/my-deployment 6750m/5 1 10 6 7m55s
但是现在,它不起作用了。我已经尝试为其他指标启动新的 HPA,它有效。就这一个。奇怪...
新编辑: 由于 EKS 集群,这是可能的,正如我所看到的:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-27-177-146.us-west-2.compute.internal Ready <none> 14h v1.18.9-eks-d1db3c
ip-172-27-183-31.us-west-2.compute.internal Ready,SchedulingDisabled <none> 15h v1.18.9-eks-d1db3c
SchedulingDisabled 是否意味着群集不足以容纳新的 pods?
想到的一件事是您的 metrics-server might not be running correctly. Without data from the metrics-server,水平 Pod 自动缩放将不起作用。
想通了。这是 EKS 集群问题。我有最多 2 个按需节点和最多 2 个现场节点的资源限制。需要增加集群节点。