点燃节点 carshed [ttl-cleanup-worker]
Ignite node carshed [ttl-cleanup-worker]
我有 Ignite 2.7 和 5 节点集群。超过 4000 万数据正在生成并存储在 ignite 缓存中。我已经设置了 3 天到期。今天,其中一个点燃节点停止并显示以下错误。请帮助我确定并解决问题。
[2019-09-11 07:45:59,570][ERROR][ttl-cleanup-worker-#170][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Unknown page type: 1 pageId: 000102210006d4ac]]
java.lang.IllegalStateException: Unknown page type: 1 pageId: 000102210006d4ac
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access0(BPlusTree.java:90)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5330)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
[2019-09-11 07:45:59,575][WARN ][ttl-cleanup-worker-#170][FailureProcessor] No deadlocked threads detected.
[2019-09-11 07:46:40,831][WARN ][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 41233 milliseconds.
[2019-09-11 07:46:40,831][ERROR][sys-stripe-0-#1][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=gri
d-nio-worker-tcp-comm-23, blockedFor=41s]
[2019-09-11 07:46:40,832][WARN ][sys-stripe-0-#1][G] Thread [name="grid-nio-worker-tcp-comm-23-#143", id=173, state=RUNNABLE, blockCnt=0, waitCnt=0]
如果点燃是配置,
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling native persistance-->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="metricsEnabled" value="true"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
<property name="storagePath" value="/ignite_data/ignite/persistance"/>
<property name="walPath" value="/ignite_data/ignite/wal"/>
<property name="walArchivePath" value="/data/disk01/ignite/archive"/>
</bean>
</property>
<!-- Enable authentication for ignite-->
<property name="authenticationEnabled" value="true"/>
<!-- Enabling expiry policy -->
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="3"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
</list>
</property>
<!-- Enable Ignite matric logged into logs in every 10 min-->
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/home/trigger_be/apache-ignite-2.7.0/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="#{60 * 10 * 1000}"/>
<!-- Set Cluster by giving IPs-->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>172.16.5.36:49500..49509</value>
<value>172.16.5.37:49500..49509</value>
<value>172.16.5.38:49500..49509</value>
<value>172.16.5.39:49500..49509</value>
<value>172.16.5.40:49500..49509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
这看起来像是数据损坏问题。建议从此节点中完全删除持久性数据并将其重新添加到集群的基线拓扑中。如果您有足够的备份,那么数据将被重新平衡。
这看起来有点像问题IGNITE-10767。您是否启用了 MVCC(事务性 SQL、TRANSACTIONAL_SNAPSHOT 缓存)?
我有 Ignite 2.7 和 5 节点集群。超过 4000 万数据正在生成并存储在 ignite 缓存中。我已经设置了 3 天到期。今天,其中一个点燃节点停止并显示以下错误。请帮助我确定并解决问题。
[2019-09-11 07:45:59,570][ERROR][ttl-cleanup-worker-#170][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Unknown page type: 1 pageId: 000102210006d4ac]] java.lang.IllegalStateException: Unknown page type: 1 pageId: 000102210006d4ac at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.io(BPlusTree.java:5058) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access0(BPlusTree.java:90) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5330) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2232) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845) at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207) at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [2019-09-11 07:45:59,575][WARN ][ttl-cleanup-worker-#170][FailureProcessor] No deadlocked threads detected. [2019-09-11 07:46:40,831][WARN ][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 41233 milliseconds. [2019-09-11 07:46:40,831][ERROR][sys-stripe-0-#1][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=gri d-nio-worker-tcp-comm-23, blockedFor=41s] [2019-09-11 07:46:40,832][WARN ][sys-stripe-0-#1][G] Thread [name="grid-nio-worker-tcp-comm-23-#143", id=173, state=RUNNABLE, blockCnt=0, waitCnt=0]
如果点燃是配置,
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling native persistance-->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="metricsEnabled" value="true"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
<property name="storagePath" value="/ignite_data/ignite/persistance"/>
<property name="walPath" value="/ignite_data/ignite/wal"/>
<property name="walArchivePath" value="/data/disk01/ignite/archive"/>
</bean>
</property>
<!-- Enable authentication for ignite-->
<property name="authenticationEnabled" value="true"/>
<!-- Enabling expiry policy -->
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="3"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
</list>
</property>
<!-- Enable Ignite matric logged into logs in every 10 min-->
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/home/trigger_be/apache-ignite-2.7.0/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="#{60 * 10 * 1000}"/>
<!-- Set Cluster by giving IPs-->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>172.16.5.36:49500..49509</value>
<value>172.16.5.37:49500..49509</value>
<value>172.16.5.38:49500..49509</value>
<value>172.16.5.39:49500..49509</value>
<value>172.16.5.40:49500..49509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
这看起来像是数据损坏问题。建议从此节点中完全删除持久性数据并将其重新添加到集群的基线拓扑中。如果您有足够的备份,那么数据将被重新平衡。
这看起来有点像问题IGNITE-10767。您是否启用了 MVCC(事务性 SQL、TRANSACTIONAL_SNAPSHOT 缓存)?