远程 Apache Ignite 集群连接失败

Question

我可以在本地 Docker 服务器上成功加入和保留单个节点 Apache Ignite 2.8.1 拓扑运行作为 Docker 容器。

运行完全相同的程序，但在远程 Docker 服务器上，我可以看到我的程序加入集群拓扑，但在连接完成之前，我收到以下连接错误

SEVERE: Failed to send message to remote node [node=TcpDiscoveryNode [id=a239f009-bddd-4a06-845f-abb304850849, consistentId=127.0.0.1,172.17.0.13:42002, addrs=ArrayList [127.0.0.1, 172.17.0.13], sockAddrs=HashSet [/172.17.0.13:42002, /127.0.0.1:42002], discPort=42002, order=1, intOrder=1, lastExchangeTime=1605015503009, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtPartitionsSingleMessage [parts=null, partCntrs=null, partsSizes=null, partHistCntrs=null, err=null, client=true, exchangeStartTime=106333448635300, finishMsg=null, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, consistentId=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, addrs=ArrayList [0:0:0:0:0:0:0:1, 10.91.7.30, 127.0.0.1, 192.168.1.81, 192.168.38.1], sockAddrs=HashSet [host.docker.internal/192.168.1.81:0, /0:0:0:0:0:0:0:1:0, GBLG7Y7GH2.mshome.net/192.168.38.1:0, /127.0.0.1:0, GBLG7Y7GH2.enterprisenet.org/10.91.7.30:0], discPort=0, order=2, intOrder=0, lastExchangeTime=1605015498538, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=true], topVer=2, nodeId8=dc9a3700, msg=null, type=NODE_JOINED, tstamp=1605015505481], nodeId=dc9a3700, evt=NODE_JOINED], lastVer=GridCacheVersion [topVer=0, order=1605015496511, nodeOrder=0], super=GridCacheMessage [msgId=1, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], err=null, skipPrepare=false]]]]]
class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3738)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
    at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendLocalPartitions(GridDhtPartitionsExchangeFuture.java:2020)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.clientOnlyExchange(GridDhtPartitionsExchangeFuture.java:1436)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:903)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3214)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3740)
        ... 15 more
    Caused by: java.net.SocketTimeoutException
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
        ... 15 more
Caused by: java.net.SocketTimeoutException
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
    ... 15 more

我认为问题与客户端连接设置有关，因此我尝试增加客户端发现 SPI“joinTimeout”、“networkTimeout”和“socketTimeout”设置以及“connectionTimeout”和“socketWriteTimeout”设置但没有成功。

Answer 1

您是否在 Docker 和远程节点之间同时打开了 47500 和 45100 端口？

Answer 2

您必须在远程 Docker 容器中为节点运行设置一个 AddressResolver。

看看：https://www.gridgain.com/docs/latest/installation-guide/aws/manual-install-on-ec2#connecting-a-client-node

如果您使用的是 Spring 配置，那么您的配置应该如下所示：

    <property name="addressResolver">
        <bean class="org.apache.ignite.configuration.BasicAddressResolver">
            <constructor-arg>
                <map>
                    <entry key="172.31.59.27" value="3.93.186.198"/>
                </map>
            </constructor-arg>
        </bean>
    </property>

    <!-- other properties -->

    <!-- Discovery configuration -->
</bean>

此处 172.31.59.27 是内部 IP，3.93.186.198 是您要连接的外部 IP。

远程 Apache Ignite 集群连接失败

Remote Apache Ignite cluster connection failures

ignite