Elasticsearch 7.2.0:尚未发现或选举主节点,选举至少需要 X 个节点

Elasticsearch 7.2.0: master not discovered or elected yet, an election requires at least X nodes

我正在尝试自动化 kubernetes 集群中 elasticsearch 节点的水平放大和缩小过程。

最初,我在 Kubernetes 集群上部署了一个 elasticsearch 集群(3 个主节点、3 个数据节点和 3 个摄取节点)。其中,cluster.initial_master_nodes 是:

cluster.initial_master_nodes:
  - master-a
  - master-b
  - master-c

然后,我进行了scale down操作,将master node 3的个数减少为1个(没想到,测试用)。在执行此操作时,我删除了 master-cmaster-b 节点并使用以下设置重新启动了 master-a 节点:

cluster.initial_master_nodes:
  - master-a

由于 elasticsearch 节点(即 pods)使用持久卷,在重新启动节点后,master-a 减慢了以下日志:

"message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [TxdOAdryQ8GAeirXQHQL-g, VmtilfRIT6KDVv1R6MHGlw, KAJclUD2SM6rt9PxCGACSA], have discovered [] which is not a quorum; discovery will continue using [] from hosts providers and [{master-a}{VmtilfRIT6KDVv1R6MHGlw}{g29haPBLRha89dZJmclkrg}{10.244.0.95}{10.244.0.95:9300}{ml.machine_memory=12447109120, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 5, last-accepted version 40 in term 5"  }

它似乎在尝试查找 master-bmaster-c

问题:

Cluster state 也将主配置存储在 Elasticsearch 节点的数据文件夹中,在您的情况下,它似乎正在读取旧集群状态(即 3 个主节点及其 ID) .

您能否删除 master-a 的数据文件夹,以便它可以从干净的集群状态开始,它应该可以解决您的问题。

还要确保其他数据和摄取节点具有 master.node:false 设置,默认情况下为真。

cluster.initial_master_nodes 设置只在集群第一次启动时有效,但为了避免一些非常罕见的极端情况,一旦你设置了它,你就不应该改变它的值,通常你应该删除它尽快从配置文件中。来自 the reference manual 关于 cluster.initial_master_nodes:

You should not use this setting when restarting a cluster or adding a new node to an existing cluster.

除此之外,Elasticsearch 使用 quorum-based election protocol 并表示以下内容:

To be sure that the cluster remains available you must not stop half or more of the nodes in the voting configuration at the same time.

您同时停止了您的三个master-eligible节点中的两个,即超过一半,因此预计集群不再工作。

参考手册还包含instructions for removing master-eligible nodes你没有遵循:

As long as there are at least three master-eligible nodes in the cluster, as a general rule it is best to remove nodes one-at-a-time, allowing enough time for the cluster to automatically adjust the voting configuration and adapt the fault tolerance level to the new set of nodes.

If there are only two master-eligible nodes remaining then neither node can be safely removed since both are required to reliably make progress. To remove one of these nodes you must first inform Elasticsearch that it should not be part of the voting configuration, and that the voting power should instead be given to the other node.

它继续描述了在缩小到单个节点时如何使用 POST /_cluster/voting_config_exclusions/node_name 从投票配置中安全地删除不需要的节点。