警告：网络上有多个服务器广播相同的节点 ID

Question

我在不同的机器上有一个由三个 ActiveMQ 代理组成的集群运行。现在，我看到一条警告重复说明以下

2020-06-17 10:40:07,378 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=03451127-a9c9-11ea-992a-005056ad92be

这是大师的片段 broker.xml:

<connectors>
   <connector name="nettyartemis">tcp://10.5.100.1:61616</connector>
</connectors>

<discovery-groups>
   <discovery-group name="my-discovery-group">
      <local-bind-address>10.5.100.1</local-bind-address>
      <group-address>231.7.7.7</group-address>
      <group-port>9876</group-port>
      <refresh-timeout>10000</refresh-timeout>
   </discovery-group>
</discovery-groups>

<cluster-connections>
   <cluster-connection name="my-cluster">
      <connector-ref>nettyartemis</connector-ref>
      <retry-interval>500</retry-interval>
      <use-duplicate-detection>true</use-duplicate-detection>
      <message-load-balancing>STRICT</message-load-balancing>
      <max-hops>1</max-hops>
      <discovery-group-ref discovery-group-name="my-discovery-group"/>
   </cluster-connection>
</cluster-connections>


<broadcast-groups>
   <broadcast-group name="my-broadcast-group">
      <local-bind-address>10.5.100.1</local-bind-address>
      <local-bind-port>5432</local-bind-port>
      <group-address>231.7.7.7</group-address>
      <group-port>9876</group-port>
      <broadcast-period>2000</broadcast-period>
      <connector-ref>nettyartemis</connector-ref>
   </broadcast-group>
</broadcast-groups>    

<ha-policy>
   <replication>
      <master/>
   </replication>
</ha-policy>

这是其中一位奴隶的片段 broker.xml：

<connectors>
   <connector name="nettyartemistwo">tcp://10.5.100.2:61616</connector>
</connectors>

<discovery-groups>
   <discovery-group name="my-discovery-group">
      <local-bind-address>10.5.100.2</local-bind-address>
      <group-address>231.7.7.7</group-address>
      <group-port>9876</group-port>
      <refresh-timeout>10000</refresh-timeout>
   </discovery-group>
</discovery-groups>

<cluster-connections>
   <cluster-connection name="my-cluster">
      <connector-ref>nettyartemistwo</connector-ref>
      <retry-interval>500</retry-interval>
      <use-duplicate-detection>true</use-duplicate-detection>
      <message-load-balancing>STRICT</message-load-balancing>
      <max-hops>1</max-hops>
      <discovery-group-ref discovery-group-name="my-discovery-group"/>
   </cluster-connection>
</cluster-connections>

<broadcast-groups>
   <broadcast-group name="my-broadcast-group">
      <local-bind-address>10.5.100.2</local-bind-address>
      <local-bind-port>5432</local-bind-port>
      <group-address>231.7.7.7</group-address>
      <group-port>9876</group-port>
      <broadcast-period>2000</broadcast-period>
      <connector-ref>nettyartemistwo</connector-ref>
   </broadcast-group>
</broadcast-groups>

<ha-policy>
   <replication>
      <slave/>
   </replication>
</ha-policy>

我收到此警告的原因有什么建议吗？

Answer 1

首次启动代理实例时，它会初始化其日志。代理在此初始化阶段所做的一件事是生成一个 UUID，该 UUID 将用于唯一标识代理以进行集群之类的操作。这叫做 "node id."

通常，当用户看到 There are more than one servers on the network broadcasting the same node id 时，表示他们已手动将一个经纪商的日志复制到另一个经纪商。这通常在用户最初配置代理集群时完成，因为他们想复制配置而不是在每个节点上从头开始。然而，不是仅仅将 broker.xml 复制到另一个节点，整个日志也被复制，并且由于日志包含唯一的 "node id" 然后两个代理最终使用相同的 ID。

这种情况下的解决方案是从记录此消息的代理之一删除日志（默认情况下存储在 data 目录中）。一旦代理重新启动，日志将被重新初始化并创建一个新的节点 ID。

如果您配置了 HA 并且主从服务器同时处于活动状态，也可以记录此 WARN 消息。主从自然地共享相同的节点 ID，因为它们具有相同的日志（通过共享存储或复制）。然而，只有一个经纪人应该是活跃的。如果两个代理都处于活动状态，则称为 "split-brain." 这种情况可能会非常有问题，因为两个代理将独立操作相同的日志数据。这可能会导致重复的消息以及看似丢失的消息，并且恢复数据的完整性可能非常困难（如果不是不可能的话）。

在共享存储配置中，共享存储本身减轻了由于日志上的共享文件锁而导致的裂脑风险。仅允许一个经纪人实际访问日志数据。

但是，在复制配置中，裂脑的风险高得多，特别是因为主从都有自己的数据副本。如果 master 和 slave 之间的网络连接失败，那么 slave 就没有真正的方法来确定 master 是否真的死了或者只是网络问题。这就是为什么the documentation recommends using at least 3 live/backup pairs。这允许建立适当的仲裁，以便实时集群成员可以投票决定适当的故障转移。

我还看到您没有在主服务器上设置 <check-for-live-server>true</check-for-live-server> ，这可能会在发生故障转移的简单情况下导致脑裂，并且您在没有先关闭从服务器的情况下重新启动主代理。如果没有 <check-for-live-server>true</check-for-live-server>，主代理将简单地启动而不检查另一个代理（例如它的备份）是否正在广播其节点 ID。

警告：网络上有多个服务器广播相同的节点 ID

WARN: There are more than one servers on the network broadcasting the same node id

jms

activemq-artemis