由于 8478595263 > 5000000000 的本地暂停,未将节点标记为关闭
Not marking nodes down due to local pause of 8478595263 > 5000000000
我在 kubernetes 中有 3 个节点的 cassandra 集群。
使用 bitnami/cassandra helm chart 部署了 cassandra。
一段时间后根据更多请求数出现错误
WARN [GossipTasks:1] 2020-01-09 11:39:33,070 FailureDetector.java:278 - Not marking nodes down due to local pause of 8206335128 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:39:42,238 FailureDetector.java:278 - Not marking nodes down due to local pause of 6668041401 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:40:03,341 FailureDetector.java:278 - Not marking nodes down due to local pause of 15041441083 > 5000000000
WARN [PERIODIC-COMMIT-LOG-SYNCER] 2020-01-09 11:41:55,606 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 11850.79ms, 1 have exceeded the configured commit interval by an average of 1850.79ms
WARN [GossipTasks:1] 2020-01-09 11:42:20,019 Gossiper.java:783 - Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
NFO [RequestResponseStage-1] 2020-01-09 11:45:36,329 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [RequestResponseStage-1] 2020-01-09 11:45:36,330 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,931 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 45 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 2874 ms
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,933 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,949 StatusLogger.java:51 - MutationStage 0 0 226236 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ReadStage 0 0 244468 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,951 StatusLogger.java:51 - RequestResponseStage 0 0 341270 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,952 StatusLogger.java:51 - ReadRepairStage 0 0 5395 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,953 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,958 StatusLogger.java:51 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,959 StatusLogger.java:51 - CompactionExecutor 0 0 686641 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,960 StatusLogger.java:51 - MemtableReclaimMemory 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,962 StatusLogger.java:51 - PendingRangeCalculator 0 0 9 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,964 StatusLogger.java:51 - GossipStage 0 0 3093860 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,966 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,970 StatusLogger.java:51 - HintsDispatcher 0 0 10 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MigrationStage 0 0 6 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MemtablePostFlush 0 0 717 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
:
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - MemtableFlushWriter 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,976 StatusLogger.java:51 - InternalResponseStage 0 0 869 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,977 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,978 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0
INFO
INFO [Service Thread] 2020-01-09 12:11:49,292 GCInspector.java:284 - ParNew GC in 659ms. CMS Old Gen: 2056877512 -> 2057740336; Par Eden Space: 671088640 -> 0; Par Survivor Space: 2636992 -> 6187520
尝试根据一些参考问题解决但没有为 kubernetes 提供
Cassandra Error message: Not marking nodes down due to local pause. Why?
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 245904 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 696906 0 0
MutationStage 0 0 244820 0 0
MemtableReclaimMemory 0 0 697 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 3138625 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 10 0 0
RequestResponseStage 0 0 364305 0 0
Native-Transport-Requests 0 0 11089339 0 241
ReadRepairStage 0 0 5395 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 6 0 0
MemtablePostFlush 0 0 725 0 0
PerDiskMemtableFlushWriter_0 0 0 697 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 697 0 0
InternalResponseStage 0 0 869 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 45
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
根据您在上面发布的日志条目,节点过载导致它们无响应。突变被丢弃,因为提交日志磁盘跟不上写入。
您需要查看集群的大小,因为您可能需要添加更多节点以增加容量。干杯!
从上面的 "tpstats" 指标来看,指标看起来不错,但我们可以看到那里有一些变化,这表明您的集群正在过载。一些请求也在那里被阻止。提交日志似乎不接受那里的写入。您应该计划集群扩展或开始调试节点超载的原因。
我在 kubernetes 中有 3 个节点的 cassandra 集群。 使用 bitnami/cassandra helm chart 部署了 cassandra。
一段时间后根据更多请求数出现错误
WARN [GossipTasks:1] 2020-01-09 11:39:33,070 FailureDetector.java:278 - Not marking nodes down due to local pause of 8206335128 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:39:42,238 FailureDetector.java:278 - Not marking nodes down due to local pause of 6668041401 > 5000000000
WARN [GossipTasks:1] 2020-01-09 11:40:03,341 FailureDetector.java:278 - Not marking nodes down due to local pause of 15041441083 > 5000000000
WARN [PERIODIC-COMMIT-LOG-SYNCER] 2020-01-09 11:41:55,606 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 11850.79ms, 1 have exceeded the configured commit interval by an average of 1850.79ms
WARN [GossipTasks:1] 2020-01-09 11:42:20,019 Gossiper.java:783 - Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
NFO [RequestResponseStage-1] 2020-01-09 11:45:36,329 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [RequestResponseStage-1] 2020-01-09 11:45:36,330 Gossiper.java:1011 - InetAddress /100.96.7.7 is now UP
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,931 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 45 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 2874 ms
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,933 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,949 StatusLogger.java:51 - MutationStage 0 0 226236 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,950 StatusLogger.java:51 - ReadStage 0 0 244468 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,951 StatusLogger.java:51 - RequestResponseStage 0 0 341270 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,952 StatusLogger.java:51 - ReadRepairStage 0 0 5395 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,953 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,958 StatusLogger.java:51 - MiscStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,959 StatusLogger.java:51 - CompactionExecutor 0 0 686641 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,960 StatusLogger.java:51 - MemtableReclaimMemory 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,962 StatusLogger.java:51 - PendingRangeCalculator 0 0 9 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,964 StatusLogger.java:51 - GossipStage 0 0 3093860 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,966 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,970 StatusLogger.java:51 - HintsDispatcher 0 0 10 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MigrationStage 0 0 6 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,973 StatusLogger.java:51 - MemtablePostFlush 0 0 717 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
:
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,974 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - Sampler 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,975 StatusLogger.java:51 - MemtableFlushWriter 0 0 689 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,976 StatusLogger.java:51 - InternalResponseStage 0 0 869 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,977 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2020-01-09 11:45:55,978 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0
INFO
INFO [Service Thread] 2020-01-09 12:11:49,292 GCInspector.java:284 - ParNew GC in 659ms. CMS Old Gen: 2056877512 -> 2057740336; Par Eden Space: 671088640 -> 0; Par Survivor Space: 2636992 -> 6187520
尝试根据一些参考问题解决但没有为 kubernetes 提供 Cassandra Error message: Not marking nodes down due to local pause. Why?
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 245904 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 696906 0 0
MutationStage 0 0 244820 0 0
MemtableReclaimMemory 0 0 697 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 3138625 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 10 0 0
RequestResponseStage 0 0 364305 0 0
Native-Transport-Requests 0 0 11089339 0 241
ReadRepairStage 0 0 5395 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 6 0 0
MemtablePostFlush 0 0 725 0 0
PerDiskMemtableFlushWriter_0 0 0 697 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 697 0 0
InternalResponseStage 0 0 869 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 0
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 45
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
根据您在上面发布的日志条目,节点过载导致它们无响应。突变被丢弃,因为提交日志磁盘跟不上写入。
您需要查看集群的大小,因为您可能需要添加更多节点以增加容量。干杯!
从上面的 "tpstats" 指标来看,指标看起来不错,但我们可以看到那里有一些变化,这表明您的集群正在过载。一些请求也在那里被阻止。提交日志似乎不接受那里的写入。您应该计划集群扩展或开始调试节点超载的原因。