K8s 上的 Consul syncCatalog 不断陷入 CrashLoopBackOff

Consul syncCatalog on k8s keep falling into CrashLoopBackOff

我正在 k8s 1.9 版上部署一个 consul 集群:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3+coreos.0", GitCommit:"f588569ed1bd4a6c986205dd0d7b04da4ab1a3b6", GitTreeState:"clean", BuildDate:"2018-02-10T01:42:55Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

使用 hashicorp/consul-k8s:0.11.0 用于 syncCatalog:

这是我的 SyncCatalog 部署描述

Namespace:              consul-presentation
CreationTimestamp:      Sun, 29 Mar 2020 20:22:49 +0300
Labels:                 app=consul
                        chart=consul-helm
                        heritage=Tiller
                        release=consul-presentation
Annotations:            deployment.kubernetes.io/revision=1
Selector:               app=consul,chart=consul-helm,component=sync-catalog,release=consul-presentation
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=consul
                    chart=consul-helm
                    component=sync-catalog
                    release=consul-presentation
  Annotations:      consul.hashicorp.com/connect-inject=false
  Service Account:  consul-presentation-consul-sync-catalog
  Containers:
   consul-sync-catalog:
    Image:  hashicorp/consul-k8s:0.11.0
    Port:   <none>
    Command:
      /bin/sh
      -ec
      consul-k8s sync-catalog \
  -k8s-default-sync=true \
  -consul-domain=consul \
  -k8s-write-namespace=${NAMESPACE} \
  -node-port-sync-type=ExternalFirst \
  -log-level=debug \
  -add-k8s-namespace-suffix \

    Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
    Environment:
      HOST_IP:            (v1:status.hostIP)
      NAMESPACE:          (v1:metadata.namespace)
      CONSUL_HTTP_ADDR:  http://consul-presentation.test:8500
    Mounts:              <none>
  Volumes:               <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   consul-presentation-consul-sync-catalog-66b5756486 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  1m    deployment-controller  Scaled up replica set consul-presentation-consul-sync-catalog-66b5756486 to 1

这里是不健康 pod 的描述:

kubectl describe pod consul-presentation-consul-sync-catalog-66b5756486-2h2s6 -n consul-presentation                                                            
Name:           consul-presentation-consul-sync-catalog-66b5756486-2h2s6
Namespace:      consul-presentation
Node:           k8s-k4.test/10.99.1.10
Start Time:     Sun, 29 Mar 2020 20:22:49 +0300
Labels:         app=consul
                chart=consul-helm
                component=sync-catalog
                pod-template-hash=2261312042
                release=consul-presentation
Annotations:    consul.hashicorp.com/connect-inject=false
Status:         Running
IP:             10.195.5.53
Controlled By:  ReplicaSet/consul-presentation-consul-sync-catalog-66b5756486
Containers:
  consul-sync-catalog:
    Container ID:  docker://4f0c65a7be5f9b07cae51d798c532a066fb0784b28a7610dfe4f1a15a2fa5a7c
    Image:         hashicorp/consul-k8s:0.11.0
    Image ID:      docker-pullable://hashicorp/consul-k8s@sha256:8be1598ad3e71323509727162f20ed9c140c8cf09d5fa3dc351aad03ec2b0b70
    Port:          <none>
    Command:
      /bin/sh
      -ec
      consul-k8s sync-catalog \
  -k8s-default-sync=true \
  -consul-domain=consul \
  -k8s-write-namespace=${NAMESPACE} \
  -node-port-sync-type=ExternalFirst \
  -log-level=debug \
  -add-k8s-namespace-suffix \

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Sun, 29 Mar 2020 20:28:19 +0300
      Finished:     Sun, 29 Mar 2020 20:28:56 +0300
    Ready:          False
    Restart Count:  6
    Liveness:       http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:      http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
    Environment:
      HOST_IP:            (v1:status.hostIP)
      NAMESPACE:         consul-presentation (v1:metadata.namespace)
      CONSUL_HTTP_ADDR:  http://consul-presentation.test:8500
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from consul-presentation-consul-sync-catalog-token-jxw26 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  consul-presentation-consul-sync-catalog-token-jxw26:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  consul-presentation-consul-sync-catalog-token-jxw26
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                      Message
  ----     ------                 ----              ----                      -------
  Normal   Scheduled              7m                default-scheduler         Successfully assigned consul-presentation-consul-sync-catalog-66b5756486-2h2s6 to k8s-k4.test
  Normal   SuccessfulMountVolume  7m                kubelet, k8s-k4.test  MountVolume.SetUp succeeded for volume "consul-presentation-consul-sync-catalog-token-jxw26"
  Normal   Pulled                 6m (x2 over 7m)   kubelet, k8s-k4.test  Container image "hashicorp/consul-k8s:0.11.0" already present on machine
  Normal   Created                6m (x2 over 7m)   kubelet, k8s-k4.test  Created container
  Normal   Started                6m (x2 over 7m)   kubelet, k8s-k4.test  Started container
  Normal   Killing                6m                kubelet, k8s-k4.test  Killing container with id docker://consul-sync-catalog:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy              6m (x4 over 6m)   kubelet, k8s-k4.test  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy              6m (x13 over 7m)  kubelet, k8s-k4.test  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff                2m (x6 over 3m)   kubelet, k8s-k4.test  Back-off restarting failed container

我已经尝试过这个 helm 图表中描述的默认试用:https://github.com/hashicorp/consul-helm

唯一的区别是我使用 ClusterIPs 和入口,这与 pod 的运行状况无关。

有什么想法吗?

当使用带有 ClusterIPs 的 k8s ingresses 时,consul 地址应该设置为 ingress 主机,因为它实际上是暴露的,没有端口。也就是说对应的k8s部署部分应该是这样的:

Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
  HOST_IP:            (v1:status.hostIP)
  NAMESPACE:          (v1:metadata.namespace)
  CONSUL_HTTP_ADDR:  http://{INGRESS HOST}

活性探测失败告诉您同步目录进程无法与 Consul 对话。 Here 是 liveness/readiness 探测器在 consul-k8s 中的实现方式。

您提供给同步目录进程的 Consul 地址似乎是 http://consul-presentation.test:8500。这是外部领事服务器吗?它 运行 并且可以从 Kubernetes 上的 pods 访问吗?

另外,你在 k8s 上部署 Consul 客户端吗?在官方 Helm 图表中,sync-catalog 与通过 hostIP.

部署为守护进程的 Consul 客户端进行对话