Consul on Kubernetes:Consul pods 是 运行 但还没有准备好

Consul on Kubernetes: Consul pods are running but not ready

我正在使用 Kind 在我的 Mac 上的 Ubuntu VM 运行 中创建一个 3 节点集群。他们按应有的方式工作:

NAME                 STATUS   ROLES    AGE   VERSION
kind-control-plane   Ready    master   20h   v1.17.0
kind-worker          Ready    <none>   20h   v1.17.0
kind-worker2         Ready    <none>   20h   v1.17.0

我已经使用 official tutorial 和默认的 Helm chart 安装了 Consul。现在,问题是领事 pods 要么 运行 要么待定,其中 none 准备就绪:

NAME                        READY   STATUS    RESTARTS   AGE
busybox-6cd57fd969-9tzmf    1/1     Running   0          17h
hashicorp-consul-hgxdr      0/1     Running   0          18h
hashicorp-consul-server-0   0/1     Running   0          18h
hashicorp-consul-server-1   0/1     Running   0          18h
hashicorp-consul-server-2   0/1     Pending   0          18h
hashicorp-consul-vmsmt      0/1     Running   0          18h

这里是pods的完整描述:

Name:         busybox-6cd57fd969-9tzmf
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:45:03 +0800
Labels:       pod-template-hash=6cd57fd969
              run=busybox
Annotations:  <none>
Status:       Running
IP:           10.244.2.11
IPs:
  IP:           10.244.2.11
Controlled By:  ReplicaSet/busybox-6cd57fd969
Containers:
  busybox:
    Container ID:  containerd://710eba6a12607021098e3c376637476cd85faf86ac9abcf10f191126dc37026b
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    Port:          <none>
    Host Port:     <none>
    Args:
      sh
    State:          Running
      Started:      Tue, 14 Jan 2020 21:00:50 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zszqr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-zszqr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-zszqr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>


Name:         hashicorp-consul-hgxdr
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=client
              controller-revision-hash=6bc54657b6
              hasDNS=true
              pod-template-generation=1
              release=hashicorp
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.2.10
IPs:
  IP:           10.244.2.10
Controlled By:  DaemonSet/hashicorp-consul
Containers:
  consul:
    Container ID:  containerd://2209cfeaa740e3565213de6d0653dabbe9a8cbf1ffe085352a8e9d3a2d0452ec
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -node="${NODE}" \
        -advertise="${ADVERTISE_IP}" \
        -bind=0.0.0.0 \
        -client=0.0.0.0 \
        -node-meta=pod-name:${HOSTNAME} \
        -hcl="ports { grpc = 8502 }" \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -domain=consul

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:29 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADVERTISE_IP:   (v1:status.podIP)
      NAMESPACE:     default (v1:metadata.namespace)
      NODE:           (v1:spec.nodeName)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-client-config
    Optional:  false
  hashicorp-consul-client-token-4r5tv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-client-token-4r5tv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                   From                   Message
  ----     ------     ----                  ----                   -------
  Warning  Unhealthy  96s (x3206 over 14h)  kubelet, kind-worker2  Readiness probe failed:


Name:         hashicorp-consul-server-0
Namespace:    default
Priority:     0
Node:         kind-worker2/172.17.0.4
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=server
              controller-revision-hash=hashicorp-consul-server-98f4fc994
              hasDNS=true
              release=hashicorp
              statefulset.kubernetes.io/pod-name=hashicorp-consul-server-0
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.2.9
IPs:
  IP:           10.244.2.9
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Container ID:  containerd://72b7bf0e81d3ed477f76b357743e9429325da0f38ccf741f53c9587082cdfcd0
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:27 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                   Message
  ----     ------     ----                   ----                   -------
  Warning  Unhealthy  97s (x10686 over 14h)  kubelet, kind-worker2  Readiness probe failed:


Name:         hashicorp-consul-server-1
Namespace:    default
Priority:     0
Node:         kind-worker/172.17.0.3
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=server
              controller-revision-hash=hashicorp-consul-server-98f4fc994
              hasDNS=true
              release=hashicorp
              statefulset.kubernetes.io/pod-name=hashicorp-consul-server-1
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.1.8
IPs:
  IP:           10.244.1.8
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Container ID:  containerd://c1f5a88e30e545c75e58a730be5003cee93c823c21ebb29b22b79cd151164a15
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:36 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-1
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                  Message
  ----     ------     ----                   ----                  -------
  Warning  Unhealthy  95s (x10683 over 14h)  kubelet, kind-worker  Readiness probe failed:


Name:           hashicorp-consul-server-2
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=consul
                chart=consul-helm
                component=server
                controller-revision-hash=hashicorp-consul-server-98f4fc994
                hasDNS=true
                release=hashicorp
                statefulset.kubernetes.io/pod-name=hashicorp-consul-server-2
Annotations:    consul.hashicorp.com/connect-inject: false
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/hashicorp-consul-server
Containers:
  consul:
    Image:       consul:1.6.2
    Ports:       8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-default (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data-default:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-default-hashicorp-consul-server-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-server-config
    Optional:  false
  hashicorp-consul-server-token-hhdxc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-server-token-hhdxc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  63s (x434 over 18h)  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity.


Name:         hashicorp-consul-vmsmt
Namespace:    default
Priority:     0
Node:         kind-worker/172.17.0.3
Start Time:   Tue, 14 Jan 2020 17:13:57 +0800
Labels:       app=consul
              chart=consul-helm
              component=client
              controller-revision-hash=6bc54657b6
              hasDNS=true
              pod-template-generation=1
              release=hashicorp
Annotations:  consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.244.1.9
IPs:
  IP:           10.244.1.9
Controlled By:  DaemonSet/hashicorp-consul
Containers:
  consul:
    Container ID:  containerd://d502870f3476ea074b059361bc52a2c68ced551f5743b8448926bdaa319aabb0
    Image:         consul:1.6.2
    Image ID:      docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
    Ports:         8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="hashicorp-consul"

      exec /bin/consul agent \
        -node="${NODE}" \
        -advertise="${ADVERTISE_IP}" \
        -bind=0.0.0.0 \
        -client=0.0.0.0 \
        -node-meta=pod-name:${HOSTNAME} \
        -hcl="ports { grpc = 8502 }" \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -domain=consul

    State:          Running
      Started:      Tue, 14 Jan 2020 20:58:35 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADVERTISE_IP:   (v1:status.podIP)
      NAMESPACE:     default (v1:metadata.namespace)
      NODE:           (v1:spec.nodeName)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hashicorp-consul-client-config
    Optional:  false
  hashicorp-consul-client-token-4r5tv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hashicorp-consul-client-token-4r5tv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                   From                  Message
  ----     ------     ----                  ----                  -------
  Warning  Unhealthy  88s (x3207 over 14h)  kubelet, kind-worker  Readiness probe failed:

为了完整起见,这里是我的 kubelet 状态:

   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-01-15 10:59:06 +08; 1h 5min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 11910 (kubelet)
    Tasks: 17
   Memory: 50.3M
      CPU: 1min 16.431s
   CGroup: /system.slice/kubelet.service
           └─11910 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml 

Jan 15 12:04:41 ubuntu kubelet[11910]: E0115 12:04:41.610779   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:42 ubuntu kubelet[11910]: W0115 12:04:42.370702   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:46 ubuntu kubelet[11910]: E0115 12:04:46.612639   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:47 ubuntu kubelet[11910]: W0115 12:04:47.371621   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:51 ubuntu kubelet[11910]: E0115 12:04:51.614925   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:52 ubuntu kubelet[11910]: W0115 12:04:52.372164   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:56 ubuntu kubelet[11910]: E0115 12:04:56.616201   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:57 ubuntu kubelet[11910]: W0115 12:04:57.372364   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:05:01 ubuntu kubelet[11910]: E0115 12:05:01.617916   11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:05:02 ubuntu kubelet[11910]: W0115 12:05:02.372698   11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d

非常感谢任何帮助。

我复制了您的设置,创建了 3 个节点集群(1 个主节点和 2 个工作节点)并使用 helm 部署了 consul,结果与您看到的一样。所有 pods 都是 运行,只有一个待定。

在 statefulset 对象中,您可以看到 podAntiAffinity 不允许在同一节点上调度 2 个或更多服务器 pods。这就是为什么您会看到一个 pod 处于挂起状态。

我想出了 4 种方法,你可以让它发挥作用。

  1. 主节点有一个污点:node-role.kubernetes.io/master:NoSchedule 不允许在主节点上调度任何 pods。您可以通过 运行 删除此污点:kubectl taint node kind-control-plane node-role.kubernetes.io/master:NoSchedule-(注意减号,它告诉 k8s 删除污点)所以现在调度程序将能够调度一个 consul-server 留给这个节点的pod。

  2. 您可以再添加一个工作节点。

  3. 您可以从 consul-server statfulset 对象中删除 podAntiAffinity 这样调度程序就不会关心 pods 安排。

  4. requiredDuringSchedulingIgnoredDuringExecution改成preferredDuringSchedulingIgnoredDuringExecution所以这个affinity规则不需要满足,只是preferred.

如果有帮助请告诉我。

对于 consul 集群容错,推荐的仲裁大小是 3 或 5,参考 Deployment Table

Helm Charts 上法定人数的默认值为 3

replicas (integer: 3) -The number of server agents to run.

`affinity (string) - This value defines the affinity for server pods. It defaults to allowing only a single pod on each node, which minimizes risk of the cluster becoming unusable if a node is lost`

参考affinityIf you need to run more pods per node set this value to null.

因此,您至少需要 3 个可调度工作节点 来满足生产级部署中的亲和力要求,才能安装仲裁值为 3 的 consul [实际上,您需要增加如果选择要更新为 5 的值,则工作节点数为 5]

在 helm chart 的 values.yaml 上清楚地记录了在节点数较少的系统上 运行 使用什么值

通过减少副本数

~/test/consul-helm$ cat values.yaml | grep -i replica
  replicas: 3
  bootstrapExpect: 3 # Should <= replicas count
    # replicas. If you'd like a custom value, you can specify an override here.

通过禁用亲和力

~/test/consul-helm$ cat values.yaml | grep -i -A 8 affinity

  # Affinity Settings
  # Commenting out or setting as empty the affinity variable, will allow
  # deployment to single node services such as Minikube
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: {{ template "consul.name" . }}
              release: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname