Consul on Kubernetes:Consul pods 是 运行 但还没有准备好
Consul on Kubernetes: Consul pods are running but not ready
我正在使用 Kind 在我的 Mac 上的 Ubuntu VM 运行 中创建一个 3 节点集群。他们按应有的方式工作:
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready master 20h v1.17.0
kind-worker Ready <none> 20h v1.17.0
kind-worker2 Ready <none> 20h v1.17.0
我已经使用 official tutorial 和默认的 Helm chart 安装了 Consul。现在,问题是领事 pods 要么 运行 要么待定,其中 none 准备就绪:
NAME READY STATUS RESTARTS AGE
busybox-6cd57fd969-9tzmf 1/1 Running 0 17h
hashicorp-consul-hgxdr 0/1 Running 0 18h
hashicorp-consul-server-0 0/1 Running 0 18h
hashicorp-consul-server-1 0/1 Running 0 18h
hashicorp-consul-server-2 0/1 Pending 0 18h
hashicorp-consul-vmsmt 0/1 Running 0 18h
这里是pods的完整描述:
Name: busybox-6cd57fd969-9tzmf
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:45:03 +0800
Labels: pod-template-hash=6cd57fd969
run=busybox
Annotations: <none>
Status: Running
IP: 10.244.2.11
IPs:
IP: 10.244.2.11
Controlled By: ReplicaSet/busybox-6cd57fd969
Containers:
busybox:
Container ID: containerd://710eba6a12607021098e3c376637476cd85faf86ac9abcf10f191126dc37026b
Image: busybox
Image ID: docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
Port: <none>
Host Port: <none>
Args:
sh
State: Running
Started: Tue, 14 Jan 2020 21:00:50 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zszqr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-zszqr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zszqr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Name: hashicorp-consul-hgxdr
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=client
controller-revision-hash=6bc54657b6
hasDNS=true
pod-template-generation=1
release=hashicorp
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.2.10
IPs:
IP: 10.244.2.10
Controlled By: DaemonSet/hashicorp-consul
Containers:
consul:
Container ID: containerd://2209cfeaa740e3565213de6d0653dabbe9a8cbf1ffe085352a8e9d3a2d0452ec
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-node="${NODE}" \
-advertise="${ADVERTISE_IP}" \
-bind=0.0.0.0 \
-client=0.0.0.0 \
-node-meta=pod-name:${HOSTNAME} \
-hcl="ports { grpc = 8502 }" \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-domain=consul
State: Running
Started: Tue, 14 Jan 2020 20:58:29 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ADVERTISE_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
NODE: (v1:spec.nodeName)
Mounts:
/consul/config from config (rw)
/consul/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-client-config
Optional: false
hashicorp-consul-client-token-4r5tv:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-client-token-4r5tv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 96s (x3206 over 14h) kubelet, kind-worker2 Readiness probe failed:
Name: hashicorp-consul-server-0
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-0
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.2.9
IPs:
IP: 10.244.2.9
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Container ID: containerd://72b7bf0e81d3ed477f76b357743e9429325da0f38ccf741f53c9587082cdfcd0
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
State: Running
Started: Tue, 14 Jan 2020 20:58:27 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 97s (x10686 over 14h) kubelet, kind-worker2 Readiness probe failed:
Name: hashicorp-consul-server-1
Namespace: default
Priority: 0
Node: kind-worker/172.17.0.3
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-1
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.1.8
IPs:
IP: 10.244.1.8
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Container ID: containerd://c1f5a88e30e545c75e58a730be5003cee93c823c21ebb29b22b79cd151164a15
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
State: Running
Started: Tue, 14 Jan 2020 20:58:36 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-1
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 95s (x10683 over 14h) kubelet, kind-worker Readiness probe failed:
Name: hashicorp-consul-server-2
Namespace: default
Priority: 0
Node: <none>
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-2
Annotations: consul.hashicorp.com/connect-inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Image: consul:1.6.2
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-2
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 63s (x434 over 18h) default-scheduler 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity.
Name: hashicorp-consul-vmsmt
Namespace: default
Priority: 0
Node: kind-worker/172.17.0.3
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=client
controller-revision-hash=6bc54657b6
hasDNS=true
pod-template-generation=1
release=hashicorp
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.1.9
IPs:
IP: 10.244.1.9
Controlled By: DaemonSet/hashicorp-consul
Containers:
consul:
Container ID: containerd://d502870f3476ea074b059361bc52a2c68ced551f5743b8448926bdaa319aabb0
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-node="${NODE}" \
-advertise="${ADVERTISE_IP}" \
-bind=0.0.0.0 \
-client=0.0.0.0 \
-node-meta=pod-name:${HOSTNAME} \
-hcl="ports { grpc = 8502 }" \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-domain=consul
State: Running
Started: Tue, 14 Jan 2020 20:58:35 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ADVERTISE_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
NODE: (v1:spec.nodeName)
Mounts:
/consul/config from config (rw)
/consul/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-client-config
Optional: false
hashicorp-consul-client-token-4r5tv:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-client-token-4r5tv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 88s (x3207 over 14h) kubelet, kind-worker Readiness probe failed:
为了完整起见,这里是我的 kubelet
状态:
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2020-01-15 10:59:06 +08; 1h 5min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 11910 (kubelet)
Tasks: 17
Memory: 50.3M
CPU: 1min 16.431s
CGroup: /system.slice/kubelet.service
└─11910 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml
Jan 15 12:04:41 ubuntu kubelet[11910]: E0115 12:04:41.610779 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:42 ubuntu kubelet[11910]: W0115 12:04:42.370702 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:46 ubuntu kubelet[11910]: E0115 12:04:46.612639 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:47 ubuntu kubelet[11910]: W0115 12:04:47.371621 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:51 ubuntu kubelet[11910]: E0115 12:04:51.614925 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:52 ubuntu kubelet[11910]: W0115 12:04:52.372164 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:56 ubuntu kubelet[11910]: E0115 12:04:56.616201 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:57 ubuntu kubelet[11910]: W0115 12:04:57.372364 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:05:01 ubuntu kubelet[11910]: E0115 12:05:01.617916 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:05:02 ubuntu kubelet[11910]: W0115 12:05:02.372698 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
非常感谢任何帮助。
我复制了您的设置,创建了 3 个节点集群(1 个主节点和 2 个工作节点)并使用 helm 部署了 consul,结果与您看到的一样。所有 pods 都是 运行,只有一个待定。
在 statefulset 对象中,您可以看到 podAntiAffinity 不允许在同一节点上调度 2 个或更多服务器 pods。这就是为什么您会看到一个 pod 处于挂起状态。
我想出了 4 种方法,你可以让它发挥作用。
主节点有一个污点:node-role.kubernetes.io/master:NoSchedule
不允许在主节点上调度任何 pods。您可以通过 运行 删除此污点:kubectl taint node kind-control-plane node-role.kubernetes.io/master:NoSchedule-
(注意减号,它告诉 k8s 删除污点)所以现在调度程序将能够调度一个 consul-server 留给这个节点的pod。
您可以再添加一个工作节点。
您可以从 consul-server statfulset 对象中删除 podAntiAffinity 这样调度程序就不会关心 pods 安排。
把requiredDuringSchedulingIgnoredDuringExecution
改成preferredDuringSchedulingIgnoredDuringExecution
所以这个affinity规则不需要满足,只是preferred.
如果有帮助请告诉我。
对于 consul 集群容错,推荐的仲裁大小是 3 或 5,参考 Deployment Table
Helm Charts 上法定人数的默认值为 3
replicas (integer: 3) -The number of server agents to run.
`affinity (string) - This value defines the affinity for server pods. It defaults to allowing only a single pod on each node, which minimizes risk of the cluster becoming unusable if a node is lost`
参考affinityIf you need to run more pods per node set this value to null.
因此,您至少需要 3 个可调度工作节点 来满足生产级部署中的亲和力要求,才能安装仲裁值为 3 的 consul [实际上,您需要增加如果选择要更新为 5 的值,则工作节点数为 5]
在 helm chart 的 values.yaml 上清楚地记录了在节点数较少的系统上 运行 使用什么值
通过减少副本数
~/test/consul-helm$ cat values.yaml | grep -i replica
replicas: 3
bootstrapExpect: 3 # Should <= replicas count
# replicas. If you'd like a custom value, you can specify an override here.
通过禁用亲和力
~/test/consul-helm$ cat values.yaml | grep -i -A 8 affinity
# Affinity Settings
# Commenting out or setting as empty the affinity variable, will allow
# deployment to single node services such as Minikube
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: {{ template "consul.name" . }}
release: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname
我正在使用 Kind 在我的 Mac 上的 Ubuntu VM 运行 中创建一个 3 节点集群。他们按应有的方式工作:
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready master 20h v1.17.0
kind-worker Ready <none> 20h v1.17.0
kind-worker2 Ready <none> 20h v1.17.0
我已经使用 official tutorial 和默认的 Helm chart 安装了 Consul。现在,问题是领事 pods 要么 运行 要么待定,其中 none 准备就绪:
NAME READY STATUS RESTARTS AGE
busybox-6cd57fd969-9tzmf 1/1 Running 0 17h
hashicorp-consul-hgxdr 0/1 Running 0 18h
hashicorp-consul-server-0 0/1 Running 0 18h
hashicorp-consul-server-1 0/1 Running 0 18h
hashicorp-consul-server-2 0/1 Pending 0 18h
hashicorp-consul-vmsmt 0/1 Running 0 18h
这里是pods的完整描述:
Name: busybox-6cd57fd969-9tzmf
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:45:03 +0800
Labels: pod-template-hash=6cd57fd969
run=busybox
Annotations: <none>
Status: Running
IP: 10.244.2.11
IPs:
IP: 10.244.2.11
Controlled By: ReplicaSet/busybox-6cd57fd969
Containers:
busybox:
Container ID: containerd://710eba6a12607021098e3c376637476cd85faf86ac9abcf10f191126dc37026b
Image: busybox
Image ID: docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
Port: <none>
Host Port: <none>
Args:
sh
State: Running
Started: Tue, 14 Jan 2020 21:00:50 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zszqr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-zszqr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zszqr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Name: hashicorp-consul-hgxdr
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=client
controller-revision-hash=6bc54657b6
hasDNS=true
pod-template-generation=1
release=hashicorp
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.2.10
IPs:
IP: 10.244.2.10
Controlled By: DaemonSet/hashicorp-consul
Containers:
consul:
Container ID: containerd://2209cfeaa740e3565213de6d0653dabbe9a8cbf1ffe085352a8e9d3a2d0452ec
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-node="${NODE}" \
-advertise="${ADVERTISE_IP}" \
-bind=0.0.0.0 \
-client=0.0.0.0 \
-node-meta=pod-name:${HOSTNAME} \
-hcl="ports { grpc = 8502 }" \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-domain=consul
State: Running
Started: Tue, 14 Jan 2020 20:58:29 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ADVERTISE_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
NODE: (v1:spec.nodeName)
Mounts:
/consul/config from config (rw)
/consul/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-client-config
Optional: false
hashicorp-consul-client-token-4r5tv:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-client-token-4r5tv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 96s (x3206 over 14h) kubelet, kind-worker2 Readiness probe failed:
Name: hashicorp-consul-server-0
Namespace: default
Priority: 0
Node: kind-worker2/172.17.0.4
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-0
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.2.9
IPs:
IP: 10.244.2.9
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Container ID: containerd://72b7bf0e81d3ed477f76b357743e9429325da0f38ccf741f53c9587082cdfcd0
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
State: Running
Started: Tue, 14 Jan 2020 20:58:27 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 97s (x10686 over 14h) kubelet, kind-worker2 Readiness probe failed:
Name: hashicorp-consul-server-1
Namespace: default
Priority: 0
Node: kind-worker/172.17.0.3
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-1
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.1.8
IPs:
IP: 10.244.1.8
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Container ID: containerd://c1f5a88e30e545c75e58a730be5003cee93c823c21ebb29b22b79cd151164a15
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
State: Running
Started: Tue, 14 Jan 2020 20:58:36 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-1
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 95s (x10683 over 14h) kubelet, kind-worker Readiness probe failed:
Name: hashicorp-consul-server-2
Namespace: default
Priority: 0
Node: <none>
Labels: app=consul
chart=consul-helm
component=server
controller-revision-hash=hashicorp-consul-server-98f4fc994
hasDNS=true
release=hashicorp
statefulset.kubernetes.io/pod-name=hashicorp-consul-server-2
Annotations: consul.hashicorp.com/connect-inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/hashicorp-consul-server
Containers:
consul:
Image: consul:1.6.2
Ports: 8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-advertise="${POD_IP}" \
-bind=0.0.0.0 \
-bootstrap-expect=3 \
-client=0.0.0.0 \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-domain=consul \
-hcl="connect { enabled = true }" \
-ui \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-server
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
Environment:
POD_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
Mounts:
/consul/config from config (rw)
/consul/data from data-default (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-server-token-hhdxc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data-default:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-default-hashicorp-consul-server-2
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-server-config
Optional: false
hashicorp-consul-server-token-hhdxc:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-server-token-hhdxc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 63s (x434 over 18h) default-scheduler 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity.
Name: hashicorp-consul-vmsmt
Namespace: default
Priority: 0
Node: kind-worker/172.17.0.3
Start Time: Tue, 14 Jan 2020 17:13:57 +0800
Labels: app=consul
chart=consul-helm
component=client
controller-revision-hash=6bc54657b6
hasDNS=true
pod-template-generation=1
release=hashicorp
Annotations: consul.hashicorp.com/connect-inject: false
Status: Running
IP: 10.244.1.9
IPs:
IP: 10.244.1.9
Controlled By: DaemonSet/hashicorp-consul
Containers:
consul:
Container ID: containerd://d502870f3476ea074b059361bc52a2c68ced551f5743b8448926bdaa319aabb0
Image: consul:1.6.2
Image ID: docker.io/library/consul@sha256:a167e7222c84687c3e7f392f13b23d9f391cac80b6b839052e58617dab714805
Ports: 8500/TCP, 8502/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
Host Ports: 8500/TCP, 8502/TCP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
Command:
/bin/sh
-ec
CONSUL_FULLNAME="hashicorp-consul"
exec /bin/consul agent \
-node="${NODE}" \
-advertise="${ADVERTISE_IP}" \
-bind=0.0.0.0 \
-client=0.0.0.0 \
-node-meta=pod-name:${HOSTNAME} \
-hcl="ports { grpc = 8502 }" \
-config-dir=/consul/config \
-datacenter=dc1 \
-data-dir=/consul/data \
-retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
-domain=consul
State: Running
Started: Tue, 14 Jan 2020 20:58:35 +0800
Ready: False
Restart Count: 0
Readiness: exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader 2>/dev/null | \
grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ADVERTISE_IP: (v1:status.podIP)
NAMESPACE: default (v1:metadata.namespace)
NODE: (v1:spec.nodeName)
Mounts:
/consul/config from config (rw)
/consul/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hashicorp-consul-client-token-4r5tv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hashicorp-consul-client-config
Optional: false
hashicorp-consul-client-token-4r5tv:
Type: Secret (a volume populated by a Secret)
SecretName: hashicorp-consul-client-token-4r5tv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 88s (x3207 over 14h) kubelet, kind-worker Readiness probe failed:
为了完整起见,这里是我的 kubelet
状态:
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2020-01-15 10:59:06 +08; 1h 5min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 11910 (kubelet)
Tasks: 17
Memory: 50.3M
CPU: 1min 16.431s
CGroup: /system.slice/kubelet.service
└─11910 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml
Jan 15 12:04:41 ubuntu kubelet[11910]: E0115 12:04:41.610779 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:42 ubuntu kubelet[11910]: W0115 12:04:42.370702 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:46 ubuntu kubelet[11910]: E0115 12:04:46.612639 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:47 ubuntu kubelet[11910]: W0115 12:04:47.371621 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:51 ubuntu kubelet[11910]: E0115 12:04:51.614925 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:52 ubuntu kubelet[11910]: W0115 12:04:52.372164 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:04:56 ubuntu kubelet[11910]: E0115 12:04:56.616201 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:04:57 ubuntu kubelet[11910]: W0115 12:04:57.372364 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Jan 15 12:05:01 ubuntu kubelet[11910]: E0115 12:05:01.617916 11910 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message
Jan 15 12:05:02 ubuntu kubelet[11910]: W0115 12:05:02.372698 11910 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
非常感谢任何帮助。
我复制了您的设置,创建了 3 个节点集群(1 个主节点和 2 个工作节点)并使用 helm 部署了 consul,结果与您看到的一样。所有 pods 都是 运行,只有一个待定。
在 statefulset 对象中,您可以看到 podAntiAffinity 不允许在同一节点上调度 2 个或更多服务器 pods。这就是为什么您会看到一个 pod 处于挂起状态。
我想出了 4 种方法,你可以让它发挥作用。
主节点有一个污点:
node-role.kubernetes.io/master:NoSchedule
不允许在主节点上调度任何 pods。您可以通过 运行 删除此污点:kubectl taint node kind-control-plane node-role.kubernetes.io/master:NoSchedule-
(注意减号,它告诉 k8s 删除污点)所以现在调度程序将能够调度一个 consul-server 留给这个节点的pod。您可以再添加一个工作节点。
您可以从 consul-server statfulset 对象中删除 podAntiAffinity 这样调度程序就不会关心 pods 安排。
把
requiredDuringSchedulingIgnoredDuringExecution
改成preferredDuringSchedulingIgnoredDuringExecution
所以这个affinity规则不需要满足,只是preferred.
如果有帮助请告诉我。
对于 consul 集群容错,推荐的仲裁大小是 3 或 5,参考 Deployment Table
Helm Charts 上法定人数的默认值为 3
replicas (integer: 3) -The number of server agents to run.
`affinity (string) - This value defines the affinity for server pods. It defaults to allowing only a single pod on each node, which minimizes risk of the cluster becoming unusable if a node is lost`
参考affinityIf you need to run more pods per node set this value to null.
因此,您至少需要 3 个可调度工作节点 来满足生产级部署中的亲和力要求,才能安装仲裁值为 3 的 consul [实际上,您需要增加如果选择要更新为 5 的值,则工作节点数为 5]
在 helm chart 的 values.yaml 上清楚地记录了在节点数较少的系统上 运行 使用什么值
通过减少副本数
~/test/consul-helm$ cat values.yaml | grep -i replica
replicas: 3
bootstrapExpect: 3 # Should <= replicas count
# replicas. If you'd like a custom value, you can specify an override here.
通过禁用亲和力
~/test/consul-helm$ cat values.yaml | grep -i -A 8 affinity
# Affinity Settings
# Commenting out or setting as empty the affinity variable, will allow
# deployment to single node services such as Minikube
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: {{ template "consul.name" . }}
release: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname