Kops kubenet 集群自动缩放不工作

Kops kubenet cluster autoscaling not working

我有一个最多 75 个节点的 kops 集群,并添加了 cluster autoscaler. It uses kubenet 网络。 事情目前已经停止工作 - 即不再发生缩减。

群集的最大容量为 运行,即 75 个节点,即使几乎没有负载。不知道从哪里开始解决问题。

在 cluster autoscaler pod 中看到以下错误

    I0222 01:45:14.327164       1 static_autoscaler.go:97] Starting main loop
W0222 01:45:14.770818       1 static_autoscaler.go:150] Cluster is not ready for autoscaling
I0222 01:45:15.043126       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:17.121507       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:19.126665       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:21.327581       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:23.331802       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:24.775124       1 static_autoscaler.go:97] Starting main loop
W0222 01:45:25.085442       1 static_autoscaler.go:150] Cluster is not ready for autoscaling

自动缩放工作正常。

更新,在运行kops validate cluster

时也看到如下错误
    VALIDATION ERRORS
    KIND    NAME                MESSAGE
    Node    ip-172-20-32-173.ec2.internal   node "ip-172-20-32-173.ec2.internal" is not ready
 ...

I0221 22:16:02.688911    2403 node_conditions.go:60] node "ip-172-20-51-238.ec2.internal" not ready: &NodeCondition{Type:NetworkUnavailable,Status:True,LastHeartbeatTime:2019-02-21 22:15:56 -0500 EST,LastTransitionTime:2019-02-21 22:15:56 -0500 EST,Reason:NoRouteCreated,Message:RouteController failed to create a route,}

我发现问题是我的集群进入了不健康状态,因为 this limitation 在 AWS VPC 路由中 tables.My 集群已经扩展到 75 个节点,然后变得不健康并且不是能够缩小。

来自link、

One important limitation when using kubenet networking is that an AWS routing table cannot have more than 50 entries, which sets a limit of 50 nodes per cluster.