为什么 apply 会卡在 module.eks.aws_autoscaling_group.workers[0]: Refreshing 状态?
Why does apply get stuck at module.eks.aws_autoscaling_group.workers[0]: Refreshing state?
我正在尝试从官方文档部署 EKS:https://learn.hashicorp.com/terraform/kubernetes/provision-aks-cluster
部署成功,我向其中添加了 helm/redis 图表。现在,当我 运行 terraform apply
它在更新状态时卡住了:
module.eks.aws_iam_instance_profile.workers[0]: Refreshing state... [id=cluster1234]
module.vpc.aws_route.private_nat_gateway[0]: Refreshing state... [id=r-rtb-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster_https[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_egress_internet[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.cluster_https_worker_ingress[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_self[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_launch_configuration.workers[0]: Refreshing state... [id=cluster-worker-group-1234]
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
module.eks.data.null_data_source.node_groups[0]: Refreshing state...
module.eks.random_pet.workers[0]: Refreshing state... [id=diverse-vervet]
module.eks.aws_autoscaling_group.workers[0]: Refreshing state... [id=cluster-worker-group-1234]
我已经尝试离开 if 几个小时,更多小时并尝试删除所有内容并重新部署,但似乎这是一个错误或错误?
terraform apply
期间的事件日志:
$> kubectl -n infra get events --sort-by='{.lastTimestamp}'
LAST SEEN TYPE REASON OBJECT MESSAGE
58m Normal Pulled pod/redis-master-0 Container image "docker.io/oliver006/redis_exporter:v1.0.3" already present on machine
28m Warning Unhealthy pod/redis-slave-0 Readiness probe failed:
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
13m Warning Unhealthy pod/redis-slave-0 Readiness probe failed:
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
3m31s Warning BackOff pod/redis-slave-0 Back-off restarting failed container
完成后:
export TF_LOG=TRACE
和运行宁terraform apply
我又发现了这个:
2020/05/18 01:10:43 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:46 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:48 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:51 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:53 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:56 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:58 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
我无法弄清楚 prometheus 现在出了什么问题以及它们之间的关系..
为什么应用会卡在module.eks.aws_autoscaling_group.workers[0]:刷新状态?
我仍在尝试使用 tf 正确部署集群,但到目前为止,在 运行
之后,上述问题已经消失
terraform apply
export TF_LOG=TRACE
我发现图表卡住并 helm delete
解决了问题,祝调试顺利!
我正在尝试从官方文档部署 EKS:https://learn.hashicorp.com/terraform/kubernetes/provision-aks-cluster
部署成功,我向其中添加了 helm/redis 图表。现在,当我 运行 terraform apply
它在更新状态时卡住了:
module.eks.aws_iam_instance_profile.workers[0]: Refreshing state... [id=cluster1234]
module.vpc.aws_route.private_nat_gateway[0]: Refreshing state... [id=r-rtb-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster_https[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_cluster[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_egress_internet[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.cluster_https_worker_ingress[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_security_group_rule.workers_ingress_self[0]: Refreshing state... [id=sgrule-1234]
module.eks.aws_launch_configuration.workers[0]: Refreshing state... [id=cluster-worker-group-1234]
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
module.eks.data.null_data_source.node_groups[0]: Refreshing state...
module.eks.random_pet.workers[0]: Refreshing state... [id=diverse-vervet]
module.eks.aws_autoscaling_group.workers[0]: Refreshing state... [id=cluster-worker-group-1234]
我已经尝试离开 if 几个小时,更多小时并尝试删除所有内容并重新部署,但似乎这是一个错误或错误?
terraform apply
期间的事件日志:
$> kubectl -n infra get events --sort-by='{.lastTimestamp}'
LAST SEEN TYPE REASON OBJECT MESSAGE
58m Normal Pulled pod/redis-master-0 Container image "docker.io/oliver006/redis_exporter:v1.0.3" already present on machine
28m Warning Unhealthy pod/redis-slave-0 Readiness probe failed:
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
13m Warning Unhealthy pod/redis-slave-0 Readiness probe failed:
Could not connect to Redis at redis-master-0.redis-headless.infra.svc.cluster.local:6379: Name or service not known
3m31s Warning BackOff pod/redis-slave-0 Back-off restarting failed container
完成后:
export TF_LOG=TRACE
和运行宁terraform apply
我又发现了这个:
2020/05/18 01:10:43 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:46 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:48 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:51 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:53 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
2020/05/18 01:10:56 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/18 01:10:58 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "helm_release.prom-operator"
我无法弄清楚 prometheus 现在出了什么问题以及它们之间的关系..
为什么应用会卡在module.eks.aws_autoscaling_group.workers[0]:刷新状态?
我仍在尝试使用 tf 正确部署集群,但到目前为止,在 运行
之后,上述问题已经消失
terraform apply
export TF_LOG=TRACE
我发现图表卡住并 helm delete
解决了问题,祝调试顺利!