节点在 Apache Ignite 的 2 个服务器网格中断开连接

Node getting disconnected in 2 server grid in Apache Ignite

我在 Apache Ignite 中有 2 个服务器网格。当数据库加载到缓存中时,其中一个节点断开连接,以下是我收到的错误消息。我还尝试将 FailureDetectionTimeoutNetworkTimeout 值设置为最大限制,例如2147483647。我也尝试过在 post JVM Tuning 中提到的两个节点上进行 JVM 调整,但我仍然遇到相同的错误

[16:30:31,244][SEVERE][pub-#96%null%][DataStreamProcessor] Failed to respond to node [nodeId=797bf03b-3baf-4724-8eca-ccccec64605c, res=DataStreamerResponse [reqId=34834, forceLocDep=true]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=797bf03b-3baf-4724-8eca-ccccec64605c, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.42.1, 127.0.0.1, 192.168.140.52], sockAddrs=[01hw146471/192.168.140.52:47500, /10.0.42.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1478687030160, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false], topic=T1 [topic=TOPIC_DATASTREAM, id=803ded84851-797bf03b-3baf-4724-8eca-ccccec64605c], msg=DataStreamerResponse [reqId=34834, forceLocDep=true], policy=0]
 at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)
 at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1361)
 at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1331)
 at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.sendResponse(DataStreamProcessor.java:348)
 at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:313)
 at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access[=10=]0(DataStreamProcessor.java:50)
 at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.onMessage(DataStreamProcessor.java:80)
 at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
 at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
 at org.apache.ignite.internal.managers.communication.GridIoManager.access00(GridIoManager.java:106)
 at org.apache.ignite.internal.managers.communication.GridIoManager.run(GridIoManager.java:829)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=797bf03b-3baf-4724-8eca-ccccec64605c, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.42.1, 127.0.0.1, 192.168.140.52], sockAddrs=[01hw146471/192.168.140.52:47500, /10.0.42.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1478687030160, loc=false, ver=1.7.0#20160801-sha1:383273e3, isClient=false]
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1996)
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936)
 at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)
 ... 13 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=797bf03b-3baf-4724-8eca-ccccec64605c, addrs=[01hw146471/192.168.140.52:47100, /10.0.42.1:47100, /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2499)
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2140)
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2034)
 at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970)
 ... 15 more
 Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: 01hw146471/192.168.140.52:47100
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
  ... 18 more
 Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2709)
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
  ... 18 more
 Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /10.0.42.1:47100
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
  ... 18 more
 Caused by: java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2363)
  ... 18 more
 Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /0:0:0:0:0:0:0:1%lo:47100
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
  ... 18 more
 Caused by: class org.apache.ignite.IgniteCheckedException: Remote node ID is not as expected [expected=797bf03b-3baf-4724-8eca-ccccec64605c, rcvd=54ac75f7-7b87-4502-ba8c-1e3a82e87be3]
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2614)
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
  ... 18 more
 Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
  ... 18 more
 Caused by: class org.apache.ignite.IgniteCheckedException: Remote node ID is not as expected [expected=797bf03b-3baf-4724-8eca-ccccec64605c, rcvd=54ac75f7-7b87-4502-ba8c-1e3a82e87be3]
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2614)
  at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
  ... 18 more

[16:30:31] Topology snapshot [ver=7, servers=1, clients=0, CPUs=48, heap=50.0GB]

此消息通常表示目标节点已经死亡或无响应。确保:

  • 两个节点都有足够的堆并且不会运行内存不足并且不会遭受长时间的 GC 暂停。
  • 网络稳定,两个节点都可以相互连接。