如何缩小 CrateDB 集群?
How to scale down a CrateDB cluster?
为了进行测试,我想将我的 3 节点集群缩减为 2 个节点,稍后再为我的 5 节点集群做同样的事情。
但是,在遵循收缩集群的最佳实践后:
- Back up all tables
- For all tables:
alter table xyz set (number_of_replicas=2)
if it was less than 2 before
SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
3 a. If the data check should always be green, set the min_availability to 'full':
https://crate.io/docs/reference/configuration.html#graceful-stop
- Initiate graceful stop on one node
- Wait for the data check to turn green
- Repeat from 3.
- When done, persist the node configurations in
crate.yml
:
gateway.recover_after_nodes: n
discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1
gateway.expected_nodes: n
我的集群再也没有回到 "green",而且我的关键节点检查也失败了。
这里出了什么问题?
crate.yml:
...
################################## Discovery ##################################
# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.
# Set to ensure a node sees M other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# We highly recommend to set the minimum master nodes as follows:
# minimum_master_nodes: (N / 2) + 1 where N is the cluster size
# That will ensure a full recovery of the cluster state.
#
discovery.zen.minimum_master_nodes: 2
# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s
#
# Time a node is waiting for responses from other nodes to a published
# cluster state.
#
# discovery.zen.publish_timeout: 30s
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
# For example, Amazon Web Services doesn't support multicast discovery.
# Therefore, you need to specify the instances you want to connect to a
# cluster as described in the following steps:
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
# If you want to debug the discovery process, you can set a logger in
# 'config/logging.yml' to help you doing so.
#
################################### Gateway ###################################
# The gateway persists cluster meta data on disk every time the meta data
# changes. This data is stored persistently across full cluster restarts
# and recovered after nodes are started again.
# Defines the number of nodes that need to be started before any cluster
# state recovery will start.
#
gateway.recover_after_nodes: 3
# Defines the time to wait before starting the recovery once the number
# of nodes defined in gateway.recover_after_nodes are started.
#
#gateway.recover_after_time: 5m
# Defines how many nodes should be waited for until the cluster state is
# recovered immediately. The value should be equal to the number of nodes
# in the cluster.
#
gateway.expected_nodes: 3
所以有两件事很重要:
- 副本数本质上是您在典型设置中可以松散的节点数(建议使用 2 个,以便您可以缩小规模并在此过程中松散一个节点并且仍然可以)
- 建议集群 > 2 个节点使用该程序 ;)
CrateDB 将以一种没有副本和主节点共享节点的方式自动在集群中分布分片。如果这是不可能的(如果您有 2 个节点和 1 个主节点和 2 个副本,则数据检查永远不会 return 到 'green'。因此在您的情况下,将副本数设置为1 以使集群恢复绿色 (alter table mytable set (number_of_replicas = 1)
)。
关键节点检查是由于集群尚未收到更新 crate.yml:您的文件中仍包含 3 节点集群的配置,因此出现消息。由于 CrateDB 仅在启动时加载 expected_nodes(它是 not a runtime setting),因此需要重新启动整个集群才能完成缩减。可以通过滚动重启来完成,但一定要正确设置SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
,否则共识将无法工作...
此外,建议逐一缩小,以避免因重新平衡而使集群过载并意外丢失数据。
为了进行测试,我想将我的 3 节点集群缩减为 2 个节点,稍后再为我的 5 节点集群做同样的事情。
但是,在遵循收缩集群的最佳实践后:
- Back up all tables
- For all tables:
alter table xyz set (number_of_replicas=2)
if it was less than 2 beforeSET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
3 a. If the data check should always be green, set the min_availability to 'full': https://crate.io/docs/reference/configuration.html#graceful-stop- Initiate graceful stop on one node
- Wait for the data check to turn green
- Repeat from 3.
- When done, persist the node configurations in
crate.yml
:gateway.recover_after_nodes: n discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1 gateway.expected_nodes: n
我的集群再也没有回到 "green",而且我的关键节点检查也失败了。
这里出了什么问题?
crate.yml:
...
################################## Discovery ##################################
# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.
# Set to ensure a node sees M other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# We highly recommend to set the minimum master nodes as follows:
# minimum_master_nodes: (N / 2) + 1 where N is the cluster size
# That will ensure a full recovery of the cluster state.
#
discovery.zen.minimum_master_nodes: 2
# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s
#
# Time a node is waiting for responses from other nodes to a published
# cluster state.
#
# discovery.zen.publish_timeout: 30s
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
# For example, Amazon Web Services doesn't support multicast discovery.
# Therefore, you need to specify the instances you want to connect to a
# cluster as described in the following steps:
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
# If you want to debug the discovery process, you can set a logger in
# 'config/logging.yml' to help you doing so.
#
################################### Gateway ###################################
# The gateway persists cluster meta data on disk every time the meta data
# changes. This data is stored persistently across full cluster restarts
# and recovered after nodes are started again.
# Defines the number of nodes that need to be started before any cluster
# state recovery will start.
#
gateway.recover_after_nodes: 3
# Defines the time to wait before starting the recovery once the number
# of nodes defined in gateway.recover_after_nodes are started.
#
#gateway.recover_after_time: 5m
# Defines how many nodes should be waited for until the cluster state is
# recovered immediately. The value should be equal to the number of nodes
# in the cluster.
#
gateway.expected_nodes: 3
所以有两件事很重要:
- 副本数本质上是您在典型设置中可以松散的节点数(建议使用 2 个,以便您可以缩小规模并在此过程中松散一个节点并且仍然可以)
- 建议集群 > 2 个节点使用该程序 ;)
CrateDB 将以一种没有副本和主节点共享节点的方式自动在集群中分布分片。如果这是不可能的(如果您有 2 个节点和 1 个主节点和 2 个副本,则数据检查永远不会 return 到 'green'。因此在您的情况下,将副本数设置为1 以使集群恢复绿色 (alter table mytable set (number_of_replicas = 1)
)。
关键节点检查是由于集群尚未收到更新 crate.yml:您的文件中仍包含 3 节点集群的配置,因此出现消息。由于 CrateDB 仅在启动时加载 expected_nodes(它是 not a runtime setting),因此需要重新启动整个集群才能完成缩减。可以通过滚动重启来完成,但一定要正确设置SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
,否则共识将无法工作...
此外,建议逐一缩小,以避免因重新平衡而使集群过载并意外丢失数据。