我的 AKS 群集已关闭,我该如何恢复?
My AKS Cluster was brought down, how can I recover?
我一直在尝试在 AKS 的单个代理集群上对我的应用程序进行负载测试。在测试期间,与仪表板的连接停止并且从未恢复。我的应用程序似乎也出现故障,所以我假设集群处于错误状态。
API 服务器正在恢复-f4cbd3d9.hcp.centralus。azmk8s.io
kubectl cluster-info dump 显示以下错误:
{
"name": "kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"namespace": "kube-system",
"selfLink": "/api/v1/namespaces/kube-system/events/kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"uid": "47f57d3c-d577-11e7-88d4-0a58ac1f0249",
"resourceVersion": "185572",
"creationTimestamp": "2017-11-30T02:36:34Z",
"InvolvedObject": {
"Kind": "Pod",
"Namespace": "kube-system",
"Name": "kube-dns-v20-6c8f7f988b-9wpx9",
"UID": "9d2b20f2-d3f5-11e7-88d4-0a58ac1f0249",
"APIVersion": "v1",
"ResourceVersion": "299",
"FieldPath": "spec.containers{kubedns}"
},
"Reason": "Unhealthy",
"Message": "Liveness probe failed: Get http://10.244.0.4:8080/healthz-kubedns: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
"Source": {
"Component": "kubelet",
"Host": "aks-agentpool-34912234-0"
},
"FirstTimestamp": "2017-11-30T02:23:50Z",
"LastTimestamp": "2017-11-30T02:59:00Z",
"Count": 6,
"Type": "Warning"
}
以及 Kube-System 中的一些 Pod 同步错误。
问题示例:
az aks browse -g REstate.Server -n REstate
Merged "REstate" as current context in C:\Users\User\AppData\Local\Temp\tmp29d0conq
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out
您可能需要通过 ssh 连接到节点以查看 Kubelet 服务是否 运行。将来您可以设置资源配额,以免耗尽集群节点中的所有资源。
资源配额 -https://kubernetes.io/docs/concepts/policy/resource-quotas/
我一直在尝试在 AKS 的单个代理集群上对我的应用程序进行负载测试。在测试期间,与仪表板的连接停止并且从未恢复。我的应用程序似乎也出现故障,所以我假设集群处于错误状态。
API 服务器正在恢复-f4cbd3d9.hcp.centralus。azmk8s.io
kubectl cluster-info dump 显示以下错误:
{
"name": "kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"namespace": "kube-system",
"selfLink": "/api/v1/namespaces/kube-system/events/kube-dns-v20-6c8f7f988b-9wpx9.14fbbbd6bf60f0cf",
"uid": "47f57d3c-d577-11e7-88d4-0a58ac1f0249",
"resourceVersion": "185572",
"creationTimestamp": "2017-11-30T02:36:34Z",
"InvolvedObject": {
"Kind": "Pod",
"Namespace": "kube-system",
"Name": "kube-dns-v20-6c8f7f988b-9wpx9",
"UID": "9d2b20f2-d3f5-11e7-88d4-0a58ac1f0249",
"APIVersion": "v1",
"ResourceVersion": "299",
"FieldPath": "spec.containers{kubedns}"
},
"Reason": "Unhealthy",
"Message": "Liveness probe failed: Get http://10.244.0.4:8080/healthz-kubedns: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
"Source": {
"Component": "kubelet",
"Host": "aks-agentpool-34912234-0"
},
"FirstTimestamp": "2017-11-30T02:23:50Z",
"LastTimestamp": "2017-11-30T02:59:00Z",
"Count": 6,
"Type": "Warning"
}
以及 Kube-System 中的一些 Pod 同步错误。
问题示例:
az aks browse -g REstate.Server -n REstate
Merged "REstate" as current context in C:\Users\User\AppData\Local\Temp\tmp29d0conq
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
error: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out
您可能需要通过 ssh 连接到节点以查看 Kubelet 服务是否 运行。将来您可以设置资源配额,以免耗尽集群节点中的所有资源。
资源配额 -https://kubernetes.io/docs/concepts/policy/resource-quotas/