集群范围内的未定义行为 [threadName=data-streamer-stripe
cluster-wide undefined behaviour [threadName=data-streamer-stripe
我实际上是在尝试加载 ca。 5'000'0000 条记录。一段时间后(500'000 条记录)我收到以下消息
SEVERE:检测到阻塞的系统关键线程。这可能导致集群范围内的未定义行为 [threadName=data-streamer-stripe-2, blockedFor=17s]
rto_1 | 2020 年 3 月 8 日 5:02:08 下午 java.util.logging.LogManager$RootLogger 日志
rto_1 |严重:检测到严重的系统错误。将根据配置的处理程序 [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, 错误=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-2, igniteInstanceName=39d7b944-fb1a-4413-80a6-a8e42679965a, finished=false, heartbeatTs=1583686911068]] ]
rto_1 | class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-2, igniteInstanceName=39d7b944-fb1a-4413-80a6-a8e42679965a, finished=false, heartbeatTs=1583686911068]
rto_1 |在 org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
rto_1 |在 org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
rto_1 |在 org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
rto_1 |在 org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
rto_1 |在 org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
rto_1 |在 org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
我尝试在服务器节点上进行导入,然后切换到专用客户端节点。与设置无关,流式线程似乎突然花费了几秒钟。我还尝试将 DataStreamerThreadPoolSize 设置为 4,将 StreamerNodeBufferSize 设置为 200,以便可以更快地完成写入。没有任何成功。
有什么解决这个问题的建议吗?
我认为这可能与"Critical worker threads liveness check"
有关
尝试为以下配置之一设置必要的值。
1) see if you can disable Disk Persistence (if enabled)
// submit data to nodes after this time
2) IgniteDataStreamer.autoFlushFrequency(100);
// Maximum number of parallel stream operations for a single node.
3) IgniteDataStreamer.perNodeParallelOperations(48);
// disable write-through behavior
4) IgniteDataStreamer.skipStore(true);
// Allow overwrite, false to no-overwrite
5) IgniteDataStreamer.allowOverwrite(true);
如果您没有找到任何根本原因,那么至少通过错误处理找到根本原因。
https://apacheignite.readme.io/docs/critical-failures-handling
我实际上是在尝试加载 ca。 5'000'0000 条记录。一段时间后(500'000 条记录)我收到以下消息
SEVERE:检测到阻塞的系统关键线程。这可能导致集群范围内的未定义行为 [threadName=data-streamer-stripe-2, blockedFor=17s] rto_1 | 2020 年 3 月 8 日 5:02:08 下午 java.util.logging.LogManager$RootLogger 日志 rto_1 |严重:检测到严重的系统错误。将根据配置的处理程序 [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, 错误=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-2, igniteInstanceName=39d7b944-fb1a-4413-80a6-a8e42679965a, finished=false, heartbeatTs=1583686911068]] ] rto_1 | class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-2, igniteInstanceName=39d7b944-fb1a-4413-80a6-a8e42679965a, finished=false, heartbeatTs=1583686911068] rto_1 |在 org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) rto_1 |在 org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) rto_1 |在 org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) rto_1 |在 org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) rto_1 |在 org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) rto_1 |在 org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) rto_1 |在 org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
我尝试在服务器节点上进行导入,然后切换到专用客户端节点。与设置无关,流式线程似乎突然花费了几秒钟。我还尝试将 DataStreamerThreadPoolSize 设置为 4,将 StreamerNodeBufferSize 设置为 200,以便可以更快地完成写入。没有任何成功。
有什么解决这个问题的建议吗?
我认为这可能与"Critical worker threads liveness check"
有关尝试为以下配置之一设置必要的值。
1) see if you can disable Disk Persistence (if enabled)
// submit data to nodes after this time
2) IgniteDataStreamer.autoFlushFrequency(100);
// Maximum number of parallel stream operations for a single node.
3) IgniteDataStreamer.perNodeParallelOperations(48);
// disable write-through behavior
4) IgniteDataStreamer.skipStore(true);
// Allow overwrite, false to no-overwrite
5) IgniteDataStreamer.allowOverwrite(true);
如果您没有找到任何根本原因,那么至少通过错误处理找到根本原因。
https://apacheignite.readme.io/docs/critical-failures-handling