K8s 上的 Consul syncCatalog 不断陷入 CrashLoopBackOff
Consul syncCatalog on k8s keep falling into CrashLoopBackOff
我正在 k8s 1.9 版上部署一个 consul 集群:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3+coreos.0", GitCommit:"f588569ed1bd4a6c986205dd0d7b04da4ab1a3b6", GitTreeState:"clean", BuildDate:"2018-02-10T01:42:55Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
使用 hashicorp/consul-k8s:0.11.0 用于 syncCatalog:
这是我的 SyncCatalog 部署描述:
Namespace: consul-presentation
CreationTimestamp: Sun, 29 Mar 2020 20:22:49 +0300
Labels: app=consul
chart=consul-helm
heritage=Tiller
release=consul-presentation
Annotations: deployment.kubernetes.io/revision=1
Selector: app=consul,chart=consul-helm,component=sync-catalog,release=consul-presentation
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=consul
chart=consul-helm
component=sync-catalog
release=consul-presentation
Annotations: consul.hashicorp.com/connect-inject=false
Service Account: consul-presentation-consul-sync-catalog
Containers:
consul-sync-catalog:
Image: hashicorp/consul-k8s:0.11.0
Port: <none>
Command:
/bin/sh
-ec
consul-k8s sync-catalog \
-k8s-default-sync=true \
-consul-domain=consul \
-k8s-write-namespace=${NAMESPACE} \
-node-port-sync-type=ExternalFirst \
-log-level=debug \
-add-k8s-namespace-suffix \
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://consul-presentation.test:8500
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: consul-presentation-consul-sync-catalog-66b5756486 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set consul-presentation-consul-sync-catalog-66b5756486 to 1
这里是不健康 pod 的描述:
kubectl describe pod consul-presentation-consul-sync-catalog-66b5756486-2h2s6 -n consul-presentation
Name: consul-presentation-consul-sync-catalog-66b5756486-2h2s6
Namespace: consul-presentation
Node: k8s-k4.test/10.99.1.10
Start Time: Sun, 29 Mar 2020 20:22:49 +0300
Labels: app=consul
chart=consul-helm
component=sync-catalog
pod-template-hash=2261312042
release=consul-presentation
Annotations: consul.hashicorp.com/connect-inject=false
Status: Running
IP: 10.195.5.53
Controlled By: ReplicaSet/consul-presentation-consul-sync-catalog-66b5756486
Containers:
consul-sync-catalog:
Container ID: docker://4f0c65a7be5f9b07cae51d798c532a066fb0784b28a7610dfe4f1a15a2fa5a7c
Image: hashicorp/consul-k8s:0.11.0
Image ID: docker-pullable://hashicorp/consul-k8s@sha256:8be1598ad3e71323509727162f20ed9c140c8cf09d5fa3dc351aad03ec2b0b70
Port: <none>
Command:
/bin/sh
-ec
consul-k8s sync-catalog \
-k8s-default-sync=true \
-consul-domain=consul \
-k8s-write-namespace=${NAMESPACE} \
-node-port-sync-type=ExternalFirst \
-log-level=debug \
-add-k8s-namespace-suffix \
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 29 Mar 2020 20:28:19 +0300
Finished: Sun, 29 Mar 2020 20:28:56 +0300
Ready: False
Restart Count: 6
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: consul-presentation (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://consul-presentation.test:8500
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from consul-presentation-consul-sync-catalog-token-jxw26 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
consul-presentation-consul-sync-catalog-token-jxw26:
Type: Secret (a volume populated by a Secret)
SecretName: consul-presentation-consul-sync-catalog-token-jxw26
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m default-scheduler Successfully assigned consul-presentation-consul-sync-catalog-66b5756486-2h2s6 to k8s-k4.test
Normal SuccessfulMountVolume 7m kubelet, k8s-k4.test MountVolume.SetUp succeeded for volume "consul-presentation-consul-sync-catalog-token-jxw26"
Normal Pulled 6m (x2 over 7m) kubelet, k8s-k4.test Container image "hashicorp/consul-k8s:0.11.0" already present on machine
Normal Created 6m (x2 over 7m) kubelet, k8s-k4.test Created container
Normal Started 6m (x2 over 7m) kubelet, k8s-k4.test Started container
Normal Killing 6m kubelet, k8s-k4.test Killing container with id docker://consul-sync-catalog:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 6m (x4 over 6m) kubelet, k8s-k4.test Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 6m (x13 over 7m) kubelet, k8s-k4.test Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 2m (x6 over 3m) kubelet, k8s-k4.test Back-off restarting failed container
我已经尝试过这个 helm 图表中描述的默认试用:https://github.com/hashicorp/consul-helm
唯一的区别是我使用 ClusterIPs 和入口,这与 pod 的运行状况无关。
有什么想法吗?
当使用带有 ClusterIPs 的 k8s ingresses 时,consul 地址应该设置为 ingress 主机,因为它实际上是暴露的,没有端口。也就是说对应的k8s部署部分应该是这样的:
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://{INGRESS HOST}
活性探测失败告诉您同步目录进程无法与 Consul 对话。 Here 是 liveness/readiness 探测器在 consul-k8s 中的实现方式。
您提供给同步目录进程的 Consul 地址似乎是 http://consul-presentation.test:8500
。这是外部领事服务器吗?它 运行 并且可以从 Kubernetes 上的 pods 访问吗?
另外,你在 k8s 上部署 Consul 客户端吗?在官方 Helm 图表中,sync-catalog 与通过 hostIP
.
部署为守护进程的 Consul 客户端进行对话
我正在 k8s 1.9 版上部署一个 consul 集群:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3+coreos.0", GitCommit:"f588569ed1bd4a6c986205dd0d7b04da4ab1a3b6", GitTreeState:"clean", BuildDate:"2018-02-10T01:42:55Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
使用 hashicorp/consul-k8s:0.11.0 用于 syncCatalog:
这是我的 SyncCatalog 部署描述:
Namespace: consul-presentation
CreationTimestamp: Sun, 29 Mar 2020 20:22:49 +0300
Labels: app=consul
chart=consul-helm
heritage=Tiller
release=consul-presentation
Annotations: deployment.kubernetes.io/revision=1
Selector: app=consul,chart=consul-helm,component=sync-catalog,release=consul-presentation
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=consul
chart=consul-helm
component=sync-catalog
release=consul-presentation
Annotations: consul.hashicorp.com/connect-inject=false
Service Account: consul-presentation-consul-sync-catalog
Containers:
consul-sync-catalog:
Image: hashicorp/consul-k8s:0.11.0
Port: <none>
Command:
/bin/sh
-ec
consul-k8s sync-catalog \
-k8s-default-sync=true \
-consul-domain=consul \
-k8s-write-namespace=${NAMESPACE} \
-node-port-sync-type=ExternalFirst \
-log-level=debug \
-add-k8s-namespace-suffix \
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://consul-presentation.test:8500
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: consul-presentation-consul-sync-catalog-66b5756486 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set consul-presentation-consul-sync-catalog-66b5756486 to 1
这里是不健康 pod 的描述:
kubectl describe pod consul-presentation-consul-sync-catalog-66b5756486-2h2s6 -n consul-presentation
Name: consul-presentation-consul-sync-catalog-66b5756486-2h2s6
Namespace: consul-presentation
Node: k8s-k4.test/10.99.1.10
Start Time: Sun, 29 Mar 2020 20:22:49 +0300
Labels: app=consul
chart=consul-helm
component=sync-catalog
pod-template-hash=2261312042
release=consul-presentation
Annotations: consul.hashicorp.com/connect-inject=false
Status: Running
IP: 10.195.5.53
Controlled By: ReplicaSet/consul-presentation-consul-sync-catalog-66b5756486
Containers:
consul-sync-catalog:
Container ID: docker://4f0c65a7be5f9b07cae51d798c532a066fb0784b28a7610dfe4f1a15a2fa5a7c
Image: hashicorp/consul-k8s:0.11.0
Image ID: docker-pullable://hashicorp/consul-k8s@sha256:8be1598ad3e71323509727162f20ed9c140c8cf09d5fa3dc351aad03ec2b0b70
Port: <none>
Command:
/bin/sh
-ec
consul-k8s sync-catalog \
-k8s-default-sync=true \
-consul-domain=consul \
-k8s-write-namespace=${NAMESPACE} \
-node-port-sync-type=ExternalFirst \
-log-level=debug \
-add-k8s-namespace-suffix \
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 29 Mar 2020 20:28:19 +0300
Finished: Sun, 29 Mar 2020 20:28:56 +0300
Ready: False
Restart Count: 6
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: consul-presentation (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://consul-presentation.test:8500
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from consul-presentation-consul-sync-catalog-token-jxw26 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
consul-presentation-consul-sync-catalog-token-jxw26:
Type: Secret (a volume populated by a Secret)
SecretName: consul-presentation-consul-sync-catalog-token-jxw26
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m default-scheduler Successfully assigned consul-presentation-consul-sync-catalog-66b5756486-2h2s6 to k8s-k4.test
Normal SuccessfulMountVolume 7m kubelet, k8s-k4.test MountVolume.SetUp succeeded for volume "consul-presentation-consul-sync-catalog-token-jxw26"
Normal Pulled 6m (x2 over 7m) kubelet, k8s-k4.test Container image "hashicorp/consul-k8s:0.11.0" already present on machine
Normal Created 6m (x2 over 7m) kubelet, k8s-k4.test Created container
Normal Started 6m (x2 over 7m) kubelet, k8s-k4.test Started container
Normal Killing 6m kubelet, k8s-k4.test Killing container with id docker://consul-sync-catalog:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 6m (x4 over 6m) kubelet, k8s-k4.test Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 6m (x13 over 7m) kubelet, k8s-k4.test Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 2m (x6 over 3m) kubelet, k8s-k4.test Back-off restarting failed container
我已经尝试过这个 helm 图表中描述的默认试用:https://github.com/hashicorp/consul-helm
唯一的区别是我使用 ClusterIPs 和入口,这与 pod 的运行状况无关。
有什么想法吗?
当使用带有 ClusterIPs 的 k8s ingresses 时,consul 地址应该设置为 ingress 主机,因为它实际上是暴露的,没有端口。也就是说对应的k8s部署部分应该是这样的:
Liveness: http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness: http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
HOST_IP: (v1:status.hostIP)
NAMESPACE: (v1:metadata.namespace)
CONSUL_HTTP_ADDR: http://{INGRESS HOST}
活性探测失败告诉您同步目录进程无法与 Consul 对话。 Here 是 liveness/readiness 探测器在 consul-k8s 中的实现方式。
您提供给同步目录进程的 Consul 地址似乎是 http://consul-presentation.test:8500
。这是外部领事服务器吗?它 运行 并且可以从 Kubernetes 上的 pods 访问吗?
另外,你在 k8s 上部署 Consul 客户端吗?在官方 Helm 图表中,sync-catalog 与通过 hostIP
.