核心网桥的 Artemis HA 配置

Artemis HA configuration for core bridges

我们在两个共享磁盘 HA 对中有 4 台服务器,它们之间有核心网桥。核心网桥配置和它们使用的连接器(sms 和 sms1b)在所有 4 台服务器上都是相同的。唯一的区别是 master 和 slave ha,以及其他字段中的主机名(接受器、artemis 和 node0 连接器、名称)

在测试中,我们发现当两个 live 都启动时,bridge 工作得很好,但有时当关闭一个 live 服务器时,备份永远不会为 bridge 打开一个消费者。

这是用核心网桥配置一对 HA 服务器的预期方式,还是备份服务器配置错误?

<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xi="http://www.w3.org/2001/XInclude"
               xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">

   <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="urn:activemq:core ">

      <name>ba-artms3.example.com</name>      
      <security-enabled>false</security-enabled>
      <persistence-enabled>true</persistence-enabled>
      <paging-directory>/data/ba_artemis/msg-sms1/paging</paging-directory>
      <bindings-directory>/data/ba_artemis/msg-sms1/bindings</bindings-directory>
      <journal-directory>/data/ba_artemis/msg-sms1/journal</journal-directory>
      <large-messages-directory>/data/ba_artemis/msg-sms1/large-messages</large-messages-directory>

      <journal-datasync>true</journal-datasync>
      <journal-min-files>2</journal-min-files>
      <journal-pool-files>10</journal-pool-files>
      <journal-device-block-size>4096</journal-device-block-size>
      <journal-file-size>10M</journal-file-size>
      <journal-buffer-timeout>132000</journal-buffer-timeout>
      <journal-max-io>4096</journal-max-io>
      <connectors>
            <connector name="artemis">tcp://ba-artms3.example.com:2539</connector>
            <connector name = "node0">tcp://ba-artms4.example.com:2539</connector>
            <connector name="sms1">(tcp://ba-artms3.example.com:61616,tcp://ba-artms4.example.com:61616)</connector>
            <connector name="sms1b">(tcp://ba-artms9.example.com:61616,tcp://ba-artms10.example.com:61616)</connector>
      </connectors>

      <disk-scan-period>5000</disk-scan-period>
      <max-disk-usage>90</max-disk-usage>
      <critical-analyzer>true</critical-analyzer>
      <critical-analyzer-timeout>120000</critical-analyzer-timeout>
      <critical-analyzer-check-period>60000</critical-analyzer-check-period>
      <critical-analyzer-policy>HALT</critical-analyzer-policy>
      <page-sync-timeout>620000</page-sync-timeout>

      <acceptors>
         <acceptor name="artemis">tcp://ba-artms3.example.com:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;</acceptor>
         <acceptor name="cluster">tcp://ba-artms3.example.com:2539?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE;useEpoll=true</acceptor>
         <acceptor name="amqp">tcp://ba-artms3.example.com:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>

      </acceptors>

      <cluster-user>msg-sms1-cluster</cluster-user>

      <cluster-password>redacted</cluster-password>
      <cluster-connections>
         <cluster-connection name="msg-sms1">
            <connector-ref>artemis</connector-ref>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>0</max-hops>
            <static-connectors>
               <connector-ref>node0</connector-ref>

            </static-connectors>
         </cluster-connection>
      </cluster-connections>

      <ha-policy>
         <shared-store>
            <master>
               <failover-on-shutdown>true</failover-on-shutdown>
            </master>
         </shared-store>
      </ha-policy>

      <security-settings>
         <security-setting match="#">
            <permission type="createNonDurableQueue" roles="amq"/>
            <permission type="deleteNonDurableQueue" roles="amq"/>
            <permission type="createDurableQueue" roles="amq"/>
            <permission type="deleteDurableQueue" roles="amq"/>
            <permission type="createAddress" roles="amq"/>
            <permission type="deleteAddress" roles="amq"/>
            <permission type="consume" roles="amq"/>
            <permission type="browse" roles="amq"/>
            <permission type="send" roles="amq"/>
            <!-- we need this otherwise ./artemis data imp wouldn't work -->
            <permission type="manage" roles="amq"/>
         </security-setting>
      </security-settings>

      <address-settings>

         <address-setting match="activemq.management#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <max-size-bytes>-1</max-size-bytes>
            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
         <address-setting match="#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <max-size-bytes>-1</max-size-bytes>
            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-create-jms-queues>true</auto-create-jms-queues>
            <auto-create-jms-topics>true</auto-create-jms-topics>
         </address-setting>
      </address-settings>

      <xi:include href="${configDir}/addresses.xml"/>

 <bridges>
     <bridge name="sms1_forwarder">
       <queue-name>UpdateOutboundForward_0</queue-name>
       <forwarding-address>UpdateOutbound</forwarding-address>
       <ha>true</ha>
       <failover-on-server-shutdown>true</failover-on-server-shutdown>
       <user>rave</user>
       <password>redacted</password>
       <static-connectors>
         <connector-ref>sms1</connector-ref>
       </static-connectors>
     </bridge>
     <bridge name="sms1b_forwarder">
       <queue-name>UpdateOutboundForward_1</queue-name>
       <forwarding-address>UpdateOutbound</forwarding-address>
       <ha>true</ha>
       <failover-on-server-shutdown>true</failover-on-server-shutdown>
       <user>rave</user>
       <password>redacted</password>
       <static-connectors>
         <connector-ref>sms1b</connector-ref>
       </static-connectors>
     </bridge>
  </bridges>
   </core>
</configuration>

请记住,端口 2539 上的 acceptor 专门用于集群。总共有 4 个服务器:ba-artms3 (live), ba-artms4 (slave) & ba-artms9 (live), ba-artms10 (slave).

您的配置看起来基本上是正确的,但很难说有这么多移动部分 - 尤其是用于群集的额外 acceptor。我以前见过有人这样做,但我从来没有测试过或推荐过它,所以我不确定它在实践中会如何发挥作用。理论上没问题,但总有一些复杂因素,其中很多是微妙的。

将您的配置简化为仅重现问题绝对必要的配置是值得的。例如,在连接到 live/backup 对的 1 台服务器上仅配置 1 个网桥,本地计算机上的所有代理都在唯一端口上(即没有 docker)。一旦你完成了这项工作,你就可以继续增加复杂性和测试,看看哪里出了问题(假设它们确实发生了)。

对我有用的解决方案是每台服务器使用一个连接器而不是方括号和逗号语法:

<connectors>
  <connector name="artemis">tcp://ba-artms3.example.com:2539</connector>
  <connector name = "node0">tcp://ba-artms4.example.com:2539</connector>
  <connector name="sms1_1">tcp://ba-artms3.example.com:2539</connector>
  <connector name="sms1_2">tcp://ba-artms4.example.com:2539</connector>
  <connector name="sms1b_1">tcp://ba-artms9.example.com:2539</connector>
  <connector name="sms1b_2">tcp://ba-artms10.example.com:2539</connector>
</connectors>

然后在列表中列出它们:

<bridge name="sms1b_forwarder">
  <queue-name>UpdateOutboundForward_1</queue-name>
  <forwarding-address>UpdateOutbound</forwarding-address>
  <ha>true</ha>
  <failover-on-server-shutdown>true</failover-on-server-shutdown>
  <user>rave</user>
  <password>redacted</password>
  <static-connectors>
    <connector-ref>sms1b_1</connector-ref>
    <connector-ref>sms1b_2</connector-ref>
  </static-connectors>
</bridge>

之后在日志中而不是看到:

2021-07-20 09:13:51,809 WARN  [org.apache.activemq.artemis.core.server] AMQ224091: Bridge BridgeImpl@5886a172 [name=sms1b_forwarder, queue=QueueImpl[name=UpdateOutboundForward_1, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]@56414412 targetConnector=ServerLocatorImpl (identity=Bridge sms1b_forwarder) [initialConnectors=[TransportConfiguration(name=sms1b, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms9-example-com], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying
2021-07-20 09:13:51,880 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl@983e222 [name=sms1_forwarder, queue=QueueImpl[name=UpdateOutboundForward_0, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]@7440d62 targetConnector=ServerLocatorImpl (identity=Bridge sms1_forwarder) [initialConnectors=[TransportConfiguration(name=sms1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms3-example-com], discoveryGroupConfiguration=null]] is connected

我看到了:

2021-07-20 09:30:40,333 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl@339a572d [name=sms1b_forwarder, queue=QueueImpl[name=UpdateOutboundForward_1, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]@3d5daf2e targetConnector=ServerLocatorImpl (identity=Bridge sms1b_forwarder) [initialConnectors=[TransportConfiguration(name=sms1b_1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms9-example-com, TransportConfiguration(name=sms1b_2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms10-example-com], discoveryGroupConfiguration=null]] is connected
2021-07-20 09:30:42,206 INFO  [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl@47c22520 [name=sms1_forwarder, queue=QueueImpl[name=UpdateOutboundForward_0, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]@4905c3a8 targetConnector=ServerLocatorImpl (identity=Bridge sms1_forwarder) [initialConnectors=[TransportConfiguration(name=sms1_1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=2539&host=ba-artms3-example-com, TransportConfiguration(name=sms1_2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=2539&host=ba-artms4-example-com], discoveryGroupConfiguration=null]] is connected

注意两个服务器现在都显示在 initialConnectors 中。