一段时间后 Apache Ignite 缓存冻结

Apache Ignite cache freeze after a period of time

我有一个包含 6 个服务器和 1 个客户端节点的集群。我的客户端节点执行大量更新和创建作业,并且还有一个过期策略以捕获过期的项目。

但是每天集群至少卡死一次。甚至 ignitevisor 的缓存命令在调用过程中也会冻结。

所以我查看了线程转储,我看到了一件奇怪的事情,有很多类似的语句:

"pub-#39%null%" #51 prio=5 os_prio=0 tid=0x00007f9788623800 nid=0x1d02 waiting on condition [0x00007f9769ddc000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000006c004aaa8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

所以有很多线程在等待一个条件,但不知何故它永远不会发生。

我的缓存配置如下:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->
    <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">

        <!-- Configure internal thread pool. -->
            <property name="publicThreadPoolSize" value="64"/>

        <!-- Configure system thread pool. -->
            <property name="systemThreadPoolSize" value="32"/>

                <property name="discoverySpi">
                        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                                <property name="ipFinder">
                                        <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                                                <property name="addresses">
                                                        <list>
                                                                ...
                                                        </list>
                                                </property>
                                        </bean>
                                </property>
                        </bean>
                </property>
        <property name="cacheConfiguration">
            <list>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="asd1"/>
                    <property name="eagerTtl" value="true"/>
                    <property name="expiryPolicyFactory">
                        <bean class="javax.cache.configuration.FactoryBuilder.SingletonFactory">
                            <constructor-arg name="instance">
                                <bean class="javax.cache.expiry.TouchedExpiryPolicy">
                                    <constructor-arg name="expiryDuration">
                                        <bean class="javax.cache.expiry.Duration">
                                            <constructor-arg name="timeUnit">
                                                <value type="java.util.concurrent.TimeUnit">MILLISECONDS</value>
                                            </constructor-arg>
                                            <constructor-arg name="durationAmount" value="10800000"/>
                                        </bean>
                                    </constructor-arg>
                                </bean>
                            </constructor-arg>
                        </bean>
                    </property>
                </bean>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="asd2"/>
                    <property name="eagerTtl" value="true"/>
                    <property name="expiryPolicyFactory">
                        <bean class="javax.cache.configuration.FactoryBuilder.SingletonFactory">
                            <constructor-arg name="instance">
                                <bean class="javax.cache.expiry.TouchedExpiryPolicy">
                                    <constructor-arg name="expiryDuration">
                                        <bean class="javax.cache.expiry.Duration">
                                            <constructor-arg name="timeUnit">
                                                <value type="java.util.concurrent.TimeUnit">MILLISECONDS</value>
                                            </constructor-arg>
                                            <constructor-arg name="durationAmount" value="86400000"/>
                                        </bean>
                                    </constructor-arg>
                                </bean>
                            </constructor-arg>
                        </bean>
                    </property>
                </bean>
            </list>
        </property>
        <property name="includeEventTypes" value="70"/>

    </bean>

</beans>

我真的需要帮助。感谢

Apache Ignite 论坛对此进行了讨论:http://apache-ignite-users.70518.x6.nabble.com/Apache-Ignite-cluster-freeze-after-a-period-of-time-td7726.html

死锁很可能是由 putAll 操作中重新排序的键引起的。