通过 JGroups 协议使用共享存储 HA 策略在 Artemis 集群中实现高可用性和故障转移

Question

在 Artemis ActiveMQ 的文档中指出，如果为复制 HA 策略配置了高可用性，那么您可以指定备份服务器可以连接到的一组实时服务器。这是通过在 broker.xml 的主元素和从元素中配置 group-name 来完成的。备份服务器将仅连接到共享相同节点组名称的实时服务器。

但是shared-store中没有group-name这个概念。我很迷惑。如果我必须通过 JGroups 中的共享存储实现高可用性，那么该怎么做。

当我再次尝试通过提供 group-name 的复制 HA 策略执行此操作时，集群已形成并且故障转移正在运行，但我收到警告说：

2020-10-02 16:35:21,517 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=220da24b-049c-11eb-8da6-0050569b585d
2020-10-02 16:35:21,517 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=220da24b-049c-11eb-8da6-0050569b585d
2020-10-02 16:35:25,350 WARN  [org.apache.activemq.artemis.core.server] AMQ224078: The size of duplicate cache detection (<id_cache-size/>) appears to be too large 20,000. It should be no greater than the number of messages that can be squeezed into confirmation window buffer (<confirmation-window-size/>) 32,000.

Answer 1

正如名字“shared-store”所表明的，live和backup broker成为逻辑对，可以支持高可用性和fail-over因为它们共享相同的数据存储。因为它们共享相同的数据存储，所以不需要任何类型的 group-name 配置。这样的选择会令人困惑、多余，而且最终毫无用处。

存在 JGroups 配置（以及更普遍的 cluster-connection）是因为两个代理需要相互交换有关各自网络位置的信息，以便在线代理可以通知客户端如何连接到备份以防万一。

关于关于网络上重复节点 ID 的 WARN 消息...您可能会在故障转移或 fail-back 期间收到该警告消息一次，可能两次，但如果您看到它超过那那就有问题了。如果您使用的是 shared-store，则表示共享文件系统上的锁存在问题。如果您使用的是复制，则表示可能存在配置错误或 split-brain.

通过 JGroups 协议使用共享存储 HA 策略在 Artemis 集群中实现高可用性和故障转移

Achieve high availability and failover in Artemis Cluster with shared-store HA policy through JGroups protocol

failover

jgroups

high-availability

activemq-artemis