如何缩小 CrateDB 集群?

How to scale down a CrateDB cluster?

为了进行测试,我想将我的 3 节点集群缩减为 2 个节点,稍后再为我的 5 节点集群做同样的事情。

但是,在遵循收缩集群的最佳实践后:

  1. Back up all tables
  2. For all tables: alter table xyz set (number_of_replicas=2) if it was less than 2 before
  3. SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
    3 a. If the data check should always be green, set the min_availability to 'full': https://crate.io/docs/reference/configuration.html#graceful-stop
  4. Initiate graceful stop on one node
  5. Wait for the data check to turn green
  6. Repeat from 3.
  7. When done, persist the node configurations in crate.yml: gateway.recover_after_nodes: n discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1 gateway.expected_nodes: n

我的集群再也没有回到 "green",而且我的关键节点检查也失败了。

这里出了什么问题?

crate.yml:

  ... 
  ################################## Discovery ##################################

  # Discovery infrastructure ensures nodes can be found within a cluster
  # and master node is elected. Multicast discovery is the default.

  # Set to ensure a node sees M other master eligible nodes to be considered
  # operational within the cluster. Its recommended to set it to a higher value
  # than 1 when running more than 2 nodes in the cluster.
  #
  # We highly recommend to set the minimum master nodes as follows:
  #   minimum_master_nodes: (N / 2) + 1 where N is the cluster size
  # That will ensure a full recovery of the cluster state.
  #
  discovery.zen.minimum_master_nodes: 2

  # Set the time to wait for ping responses from other nodes when discovering.
  # Set this option to a higher value on a slow or congested network
  # to minimize discovery failures:
  #
  # discovery.zen.ping.timeout: 3s
  #

  # Time a node is waiting for responses from other nodes to a published
  # cluster state.
  #
  # discovery.zen.publish_timeout: 30s

  # Unicast discovery allows to explicitly control which nodes will be used
  # to discover the cluster. It can be used when multicast is not present,
  # or to restrict the cluster communication-wise.
  # For example, Amazon Web Services doesn't support multicast discovery.
  # Therefore, you need to specify the instances you want to connect to a
  # cluster as described in the following steps:
  #
  # 1. Disable multicast discovery (enabled by default):
  #
  discovery.zen.ping.multicast.enabled: false
  #
  # 2. Configure an initial list of master nodes in the cluster
  #    to perform discovery when new nodes (master or data) are started:
  #
  # If you want to debug the discovery process, you can set a logger in
  # 'config/logging.yml' to help you doing so.
  #
  ################################### Gateway ###################################

  # The gateway persists cluster meta data on disk every time the meta data
  # changes. This data is stored persistently across full cluster restarts
  # and recovered after nodes are started again.

  # Defines the number of nodes that need to be started before any cluster
  # state recovery will start.
  #
  gateway.recover_after_nodes: 3

  # Defines the time to wait before starting the recovery once the number
  # of nodes defined in gateway.recover_after_nodes are started.
  #
  #gateway.recover_after_time: 5m

  # Defines how many nodes should be waited for until the cluster state is
  # recovered immediately. The value should be equal to the number of nodes
  # in the cluster.
  #
  gateway.expected_nodes: 3

所以有两件事很重要:

  • 副本数本质上是您在典型设置中可以松散的节点数(建议使用 2 个,以便您可以缩小规模并在此过程中松散一个节点并且仍然可以)
  • 建议集群 > 2 个节点使用该程序 ;)

CrateDB 将以一种没有副本和主节点共享节点的方式自动在集群中分布分片。如果这是不可能的(如果您有 2 个节点和 1 个主节点和 2 个副本,则数据检查永远不会 return 到 'green'。因此在您的情况下,将副本数设置为1 以使集群恢复绿色 (alter table mytable set (number_of_replicas = 1))。

关键节点检查是由于集群尚未收到更新 crate.yml:您的文件中仍包含 3 节点集群的配置,因此出现消息。由于 CrateDB 仅在启动时加载 expected_nodes(它是 not a runtime setting),因此需要重新启动整个集群才能完成缩减。可以通过滚动重启来完成,但一定要正确设置SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;,否则共识将无法工作...

此外,建议逐一缩小,以避免因重新平衡而使集群过载并意外丢失数据。