麻烦跟踪卡桑德拉查询
Trouble tracing cassandra query
我在使用 Cassandra 2.1.5 时遇到了非常糟糕的性能。我对此很陌生,所以希望得到有关如何调试的任何建议。这是我的 table 的样子:
Keyspace: nt_live_october x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Latency: 20837.149166666666 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Count: 39799 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Latency: 0.45696595391844014 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending Flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Table: nt x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable count: 12 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (live): 15903191275 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (total): 15971044770 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used by snapshots (total): 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Off heap memory used (total): 14468424 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable Compression Ratio: 0.1308103413354315 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of keys (estimate): 740 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable cell count: 43483 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable data size: 9272510 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable off heap memory used: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable switch count: 17 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read latency: 20837.150 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write count: 39801 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write latency: 0.457 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false positives: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false ratio: 0.00000 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter space used: 4832 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter off heap memory used: 4736 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index summary off heap memory used: 576 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compression metadata off heap memory used: 14463112 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition minimum bytes: 6867 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition maximum bytes: 30753941057 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition mean bytes: 44147544 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average tombstones per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum tombstones per slice (last five minutes): 0.0
我正在通过 cqlsh 发出以下查询:
cassandra@cqlsh> TRACING ON; Tracing is already enabled. Use TRACING OFF to disable. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds
下面是 system_traces.events 显示的内容:
xxx.xxx.xxx.203 | 1281 | Parsing select * from nt_live_october.nt where group_id='254358'\nand epoch >=1443916800 and epoch<=1444348800\nLIMIT 30;
xxx.xxx.xxx.203 | 2604 | Preparing statement
xxx.xxx.xxx.203 | 8454 | Executing single-partition query on users
xxx.xxx.xxx.203 | 8474 | Acquiring sstable references
xxx.xxx.xxx.203 | 8547 | Merging memtable tombstones
xxx.xxx.xxx.203 | 8675 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 8685 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 9040 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 9056 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 9120 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 9854 | Read-repair DC_LOCAL
xxx.xxx.xxx.203 | 10033 | Executing single-partition query on users
xxx.xxx.xxx.203 | 10046 | Acquiring sstable references
xxx.xxx.xxx.203 | 10105 | Merging memtable tombstones
xxx.xxx.xxx.203 | 10189 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 10198 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 10248 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 10261 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 10296 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 12511 | Executing single-partition query on nt
xxx.xxx.xxx.203 | 12525 | Acquiring sstable references
xxx.xxx.xxx.203 | 12587 | Merging memtable tombstones
xxx.xxx.xxx.203 | 18067 | speculating read retry on /xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 18577 | Sending READ message to xxx.xxx.xxx.205/xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 25534 | Partition index with 6093 entries found for sstable 8885
xxx.xxx.xxx.203 | 25571 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 34989 | Partition index with 5327 entries found for sstable 8524
xxx.xxx.xxx.203 | 35022 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 36322 | Partition index with 333 entries found for sstable 8477
xxx.xxx.xxx.203 | 36336 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 714242 | Partition index with 299251 entries found for sstable 8541
xxx.xxx.xxx.203 | 714279 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 715717 | Partition index with 501 entries found for sstable 8217
xxx.xxx.xxx.203 | 715745 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 716232 | Partition index with 252 entries found for sstable 8888
xxx.xxx.xxx.203 | 716245 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 87 | READ message received from /xxx.xxx.xxx.203
xxx.xxx.xxx.205 | 50427 | Executing single-partition query on nt
xxx.xxx.xxx.205 | 50535 | Acquiring sstable references
xxx.xxx.xxx.205 | 50628 | Merging memtable tombstones
xxx.xxx.xxx.205 | 170441 | Partition index with 35650 entries found for sstable 6332
xxx.xxx.xxx.203 | 30718026 | Partition index with 199905 entries found for sstable 5958
xxx.xxx.xxx.203 | 30718077 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 170499 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 248898 | Partition index with 30958 entries found for sstable 6797
xxx.xxx.xxx.205 | 248962 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 67814573 | Read timeout: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
xxx.xxx.xxx.203 | 67814675 | Timed out; received 0 of 1 responses
我有 4 个节点,复制因子为 3(一个节点很轻,但不是 .203)我尝试读取的数据不是很多——即使没有推送 LIMIT 1到远程节点,间隔的低端应该是大约 3 小时前(我没有超过当前时间的纪元)
关于如何修复 this/what 可能出错的任何提示?我的 cassandra 版本是 2.1.9,运行 大部分都是默认值
Table 架构如下(出于隐私原因,我无法发布整个架构,但显示密钥是我希望最重要的事情)
PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
___________EDIT_____________
回答以下问题:
状态输出:
-- Address Load Tokens Owns Host ID Rack x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN xxx.xxx.xxx.204 15.8 GB 1 ? 32ed196b-f6eb-4e93-b759 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.205 20.38 GB 1 ? 446d71aa-e9cd-4ca9-a6ac r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.202 1.48 GB 1 ? 2a6670b2-63f2-43be-b672 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.203 15.72 GB 1 ? dd26dfee-82da-454b-8db2 r1
system.log 比较棘手,因为我在那里登录了很多...我看到的一个可疑的事情是
WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)
但这只是一个警告...我看到后不久
INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,]. 35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s. 21 total partitions merged to 18. Partition merge counts were {1:17, 4:1, }
我在日志中看到了很多这样的对...但是没有错误级别的消息。压缩似乎进展顺利..它确实说这是最大的列族,但所有消息都是 INFO 级别....
首先,节点204的DN状态为down。检索它的 system.log 并寻找 :
- 异常和错误级别日志
- 异常 GC activity(收集时间超过 200 毫秒)
- 状态记录器
其次,数据在集群中分布不均。 202 的负载只有 1.48 GB。我怀疑您在其他节点上复制了一些非常大的分区。什么是复制因子?您的密钥空间方案是什么?您可以使用 cqlsh 命令回答这些问题:
DESCRIBE KEYSPACE nt_live_october;
我在使用 Cassandra 2.1.5 时遇到了非常糟糕的性能。我对此很陌生,所以希望得到有关如何调试的任何建议。这是我的 table 的样子:
Keyspace: nt_live_october x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read Latency: 20837.149166666666 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Count: 39799 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Write Latency: 0.45696595391844014 ms. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending Flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Table: nt x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable count: 12 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (live): 15903191275 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used (total): 15971044770 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Space used by snapshots (total): 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Off heap memory used (total): 14468424 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SSTable Compression Ratio: 0.1308103413354315 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of keys (estimate): 740 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable cell count: 43483 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable data size: 9272510 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable off heap memory used: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Memtable switch count: 17 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read count: 6 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local read latency: 20837.150 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write count: 39801 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local write latency: 0.457 ms x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pending flushes: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false positives: 0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter false ratio: 0.00000 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter space used: 4832 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bloom filter off heap memory used: 4736 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index summary off heap memory used: 576 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compression metadata off heap memory used: 14463112 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition minimum bytes: 6867 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition maximum bytes: 30753941057 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compacted partition mean bytes: 44147544 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum live cells per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Average tombstones per slice (last five minutes): 0.0 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maximum tombstones per slice (last five minutes): 0.0
我正在通过 cqlsh 发出以下查询:
cassandra@cqlsh> TRACING ON; Tracing is already enabled. Use TRACING OFF to disable. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE. x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1; x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds
下面是 system_traces.events 显示的内容:
xxx.xxx.xxx.203 | 1281 | Parsing select * from nt_live_october.nt where group_id='254358'\nand epoch >=1443916800 and epoch<=1444348800\nLIMIT 30;
xxx.xxx.xxx.203 | 2604 | Preparing statement
xxx.xxx.xxx.203 | 8454 | Executing single-partition query on users
xxx.xxx.xxx.203 | 8474 | Acquiring sstable references
xxx.xxx.xxx.203 | 8547 | Merging memtable tombstones
xxx.xxx.xxx.203 | 8675 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 8685 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 9040 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 9056 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 9120 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 9854 | Read-repair DC_LOCAL
xxx.xxx.xxx.203 | 10033 | Executing single-partition query on users
xxx.xxx.xxx.203 | 10046 | Acquiring sstable references
xxx.xxx.xxx.203 | 10105 | Merging memtable tombstones
xxx.xxx.xxx.203 | 10189 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 10198 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 10248 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 10261 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 10296 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 12511 | Executing single-partition query on nt
xxx.xxx.xxx.203 | 12525 | Acquiring sstable references
xxx.xxx.xxx.203 | 12587 | Merging memtable tombstones
xxx.xxx.xxx.203 | 18067 | speculating read retry on /xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 18577 | Sending READ message to xxx.xxx.xxx.205/xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 25534 | Partition index with 6093 entries found for sstable 8885
xxx.xxx.xxx.203 | 25571 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 34989 | Partition index with 5327 entries found for sstable 8524
xxx.xxx.xxx.203 | 35022 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 36322 | Partition index with 333 entries found for sstable 8477
xxx.xxx.xxx.203 | 36336 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 714242 | Partition index with 299251 entries found for sstable 8541
xxx.xxx.xxx.203 | 714279 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 715717 | Partition index with 501 entries found for sstable 8217
xxx.xxx.xxx.203 | 715745 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 716232 | Partition index with 252 entries found for sstable 8888
xxx.xxx.xxx.203 | 716245 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 87 | READ message received from /xxx.xxx.xxx.203
xxx.xxx.xxx.205 | 50427 | Executing single-partition query on nt
xxx.xxx.xxx.205 | 50535 | Acquiring sstable references
xxx.xxx.xxx.205 | 50628 | Merging memtable tombstones
xxx.xxx.xxx.205 | 170441 | Partition index with 35650 entries found for sstable 6332
xxx.xxx.xxx.203 | 30718026 | Partition index with 199905 entries found for sstable 5958
xxx.xxx.xxx.203 | 30718077 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 170499 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 248898 | Partition index with 30958 entries found for sstable 6797
xxx.xxx.xxx.205 | 248962 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 67814573 | Read timeout: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
xxx.xxx.xxx.203 | 67814675 | Timed out; received 0 of 1 responses
我有 4 个节点,复制因子为 3(一个节点很轻,但不是 .203)我尝试读取的数据不是很多——即使没有推送 LIMIT 1到远程节点,间隔的低端应该是大约 3 小时前(我没有超过当前时间的纪元)
关于如何修复 this/what 可能出错的任何提示?我的 cassandra 版本是 2.1.9,运行 大部分都是默认值
Table 架构如下(出于隐私原因,我无法发布整个架构,但显示密钥是我希望最重要的事情)
PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
___________EDIT_____________ 回答以下问题:
状态输出:
-- Address Load Tokens Owns Host ID Rack x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN xxx.xxx.xxx.204 15.8 GB 1 ? 32ed196b-f6eb-4e93-b759 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.205 20.38 GB 1 ? 446d71aa-e9cd-4ca9-a6ac r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.202 1.48 GB 1 ? 2a6670b2-63f2-43be-b672 r1 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN xxx.xxx.xxx.203 15.72 GB 1 ? dd26dfee-82da-454b-8db2 r1
system.log 比较棘手,因为我在那里登录了很多...我看到的一个可疑的事情是
WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)
但这只是一个警告...我看到后不久
INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,]. 35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s. 21 total partitions merged to 18. Partition merge counts were {1:17, 4:1, }
我在日志中看到了很多这样的对...但是没有错误级别的消息。压缩似乎进展顺利..它确实说这是最大的列族,但所有消息都是 INFO 级别....
首先,节点204的DN状态为down。检索它的 system.log 并寻找 :
- 异常和错误级别日志
- 异常 GC activity(收集时间超过 200 毫秒)
- 状态记录器
其次,数据在集群中分布不均。 202 的负载只有 1.48 GB。我怀疑您在其他节点上复制了一些非常大的分区。什么是复制因子?您的密钥空间方案是什么?您可以使用 cqlsh 命令回答这些问题:
DESCRIBE KEYSPACE nt_live_october;