2节点集群，Master宕机，Worker宕机

Question

我们有一个包含一个主节点和一个工作节点的 2 节点 K3S 集群，并且希望“合理的可用性”，如果一个或其他节点出现故障，集群仍然可以工作，即入口到达服务并且 pods 我们已经在两个节点上复制了它。我们有一个外部负载均衡器 (F5)，它对每个节点进行主动健康检查，并且只将流量发送到上层节点。

不幸的是，如果 master 宕机，worker 将不会提供任何流量（入口）。

这很奇怪，因为工作节点上的所有服务 pods（入口馈送）都是运行。

我们怀疑原因是 traefik ingress controller 和 coredns 等关键服务仅在 master 上运行。

事实上，当我们模拟主服务器故障时，从备份恢复它，工作服务器上 pods 的 none 可以进行任何 DNS 解析。只有重新启动 worker 才能解决这个问题。

我们已经尝试增加 traefik 和 coredns 部署的副本数量，这有点帮助但是：

这会在下次重新启动时丢失
master 宕机时 worker 仍然工作，但每 2 次入口请求失败
- 似乎 worker 仍然盲目地（循环）将流量发送到不存在的 master

我们将不胜感激一些建议和解释：

traefik 和 coredns 等关键服务不应该默认为 DaemonSets 吗？
我们如何以一种不会丢失的持久方式更改服务描述（例如副本计数）
我们如何才能通过入口仅“向上”节点获得智能流量路由
将其设为 2 主集群是否有意义？

更新：入口描述：

kubectl describe ingress -n msa
Name:             msa-ingress
Namespace:        msa
Address:          10.3.229.111,10.3.229.112
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  tls-secret terminates service.ourdomain.com,node1.ourdomain.com,node2.ourdomain.com
Rules:
  Host                           Path  Backends
  ----                           ----  --------
  service.ourdomain.com
                                /   gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
  node1.ourdomain.com
                                /   gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
  node2.ourdomain.com
                                /   gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
Annotations:                     kubernetes.io/ingress.class: traefik
                                traefik.ingress.kubernetes.io/router.middlewares: msa-middleware@kubernetescrd
Events:                          <none>

Answer 1

运行不建议在 k8s 集群中使用单节点或两节点 master，它不能容忍 master 组件发生故障。考虑运行 kubernetes 集群中的 3 个主节点。

关注 link 会有所帮助 --> https://netapp-trident.readthedocs.io/en/stable-v19.01/dag/kubernetes/kubernetes_cluster_architecture_considerations.html

Answer 2

您的目标似乎可以通过一些 K8S 内部功能（不特定于流量）来实现：

确保您有 1 个 Ingress Controller's Pod on each Node => use Daemon Set 的副本作为安装方法
要修复 Ingress 描述中的错误，请设置 Ingress Controller 服务的正确负载均衡器 IP。
使用“本地”外部流量策略 - 这确保流量仅路由到本地端点（节点上的控制器板运行接受来自负载均衡器的流量）

externalTrafficPolicy - denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints. There are two available options: Cluster (default) and Local. Cluster obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading. Local preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type Services, but risks potentially imbalanced traffic spreading.

apiVersion: v1
kind: Service
metadata:
  name: example-service
spec:
  selector:
    app: example
  ports:
    - port: 8765
      targetPort: 9376
  externalTrafficPolicy: Local
  type: LoadBalancer

Ingress 后端的服务名称也应该使用 external Traffic Policy externalTrafficPolicy: Local。

2节点集群，Master宕机，Worker宕机

2-Node Cluster, Master goes down, Worker fails

kubernetes

kubernetes-ingress

traefik-ingress

k3s