建立与 Ignite 集群的客户端连接导致服务器上出现 OutOfMemoryError

Establishing Client Connection to Ignite Cluster Causes OutOfMemoryError on Server

我在尝试将 Ignite 客户端连接到集群时 运行 遇到了一个有趣的错误。

当我使用下面的设置连接时,我在客户端和服务器上收到以下错误:

客户端日志:

24-Feb-2021 15:18:31.135 WARNING [tcp-client-disco-msg-worker-#4%igniteClientInstance%-#39%igniteClientInstance%] org.apache.ignite.logger.java.JavaLogger.warning Timed out waiting for message to be read (most probably, the reason is long GC pauses on remote node) [curTimeout=1000, rmtAddr=/XXX.XXX.XXX.XXX:yyyy, rmtPort=yyyy]
24-Feb-2021 15:18:31.137 WARNING [tcp-client-disco-msg-worker-#4%igniteClientInstance%-#39%igniteClientInstance%] org.apache.ignite.logger.java.JavaLogger.warning Failed to connect ...skipping...


而这个服务器端日志:

[14:58:09,536][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 1037 milliseconds.
[14:58:10,536][SEVERE][grid-nio-worker-client-listener-0-#31][ClientListenerProcessor] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=4 lim=439 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-client-listener-0, igniteInstanceName=null, finished=false, heartbeatTs=1614178688450, hashCode=645844509, interrupted=false, runner=grid-nio-worker-client-listener-0-#31]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl [locAddr=/10.0.1.81:10800, rmtAddr=/10.0.0.229:37584, createTime=1614178688450, closeTime=0, bytesSent=0, bytesRcvd=439, bytesSent0=0, bytesRcvd0=439, sndSchedTime=1614178688450, lastSndTime=1614178688450, lastRcvTime=1614178688450, readsPaused=false, filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter, GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]], accepted=true, markedForClose=false]]]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioServerBuffer.read(ClientListenerNioServerBuffer.java:81)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:39)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3704)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
        at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1192)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2478)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2243)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
[14:58:10,539][SEVERE][grid-nio-worker-client-listener-0-#31][ClientListenerProcessor] Closing NIO session because of unhandled exception.
class org.apache.ignite.internal.util.nio.GridNioException: Java heap space
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2504)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2243)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioServerBuffer.read(ClientListenerNioServerBuffer.java:81)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerBufferedParser.decode(ClientListenerBufferedParser.java:39)
        at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3704)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
        at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1192)
        at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2478)
        ... 4 more
[14:58:12,523][WARNING][grid-timeout-worker-#22][ClientListenerNioListener] Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.0.0.229:44098]

客户端和服务器在同一网络但不同的机器上。此外,服务器在 Kubernetes 中 运行。

如果我指定一个瘦客户端,那么我可以连接到 Ignite 服务器并毫无问题地执行查询。

Java瘦客户端代码:

ClientConfiguration cCfg = new ClientConfiguration();
cCfg.setAddresses("XXX.XXX.XXX.XXX:yyyy");
IgniteClient igniteC =Ignition.startClient(cCfg);

Java 胖客户端代码:

IgniteConfiguration cfg =(IgniteConfiguration)fsxmlac.getBean("igniteClient.cfg");
ignite= Ignition.start(cfg);

IgniteConfigurationXML:

<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->

    <bean class="org.apache.ignite.configuration.IgniteConfiguration" id="igniteClient.cfg">
        <property name="workDirectory" value="/ignite/work"/>
        <property name="clientMode" value="true" />
        <property name="dataStorageConfiguration" ref = "dataStorageConfiguration" />   
        <!--<property name="classLoader" ref="classLoader" /> -->
        <property name="igniteInstanceName" value="igniteClientInstance" />
        <property name="peerClassLoadingEnabled" value="false" />
        <property name="metricsLogFrequency" value="1000000" />
        <property name="communicationSpi" ref="communicationSpi" />
        <property name="discoverySpi" ref="discoverySpi" />
        <property name="cacheConfiguration">
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="session-cache"/>
                    <property name="cacheMode" value="PARTITIONED"/>
                        <property name="backups" value="1"/>
                            <!--
                            <property name="evictionPolicy">
                            <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
                                <property name="maxSize" value="150000"/>
                            </bean>
                            </property>
                            -->

                </bean>
        </property>             
    </bean>

    <bean id="dataStorageConfiguration" class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="defaultDataRegionConfiguration">
            <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                <property name="persistenceEnabled" value="true"/>
            </bean>
        </property>
        <property name="walPath" value="/ignite/wal"/>
        <property name="walArchivePath" value="/ignite/walarchive"/>
    </bean>
    
    <bean id="communicationSpi" class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="slowClientQueueLimit" value="1000" />
        <property name ="localPort" value="32609" />
    </bean>

    <bean id="discoverySpi" class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="ackTimeout" value="1000"/>
        <property name="socketTimeout" value="2000"/>
        <property name="ipFinder" ref="ipFinder" />
    </bean>

    <bean id="ipFinder" class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
        <property name="shared" value="false" />
        <property name="addresses">
        <list>
            <value>XXX.XXX.XXX.XXX:yyyy</value>
        </list>
    </property>
    </bean>

    <!--<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"/> 
    -->
</beans>

对于任何偶然发现这一点的人来说,最终我试图做的事情是不可能的。 我的 Tomcat 应用程序在 Kubernetes 外部,而我的 Ignite 服务器在内部。

根据下面的 link,此配置可防止使用胖客户端。当胖客户端启动时,它会尝试与集群中的所有其他 Ignite 服务器建立通信,但 Kubernetes 的负载均衡器会妨碍它。

这就是discovery spi能够建立通信,而通信spi失败的原因。

Ignite Kubernetes sets ups.

从错误中我可以看到以下错误 原因:java.lang.OutOfMemoryError:Java堆space

你可以试试下面的设置 -Xms 设置初始 Java 堆大小 -Xmx 设置最大 Java 堆大小

这也可以解决问题,如果你能设置java参数