麻烦跟踪卡桑德拉查询

Question

我在使用 Cassandra 2.1.5 时遇到了非常糟糕的性能。我对此很陌生，所以希望得到有关如何调试的任何建议。这是我的 table 的样子：

Keyspace: nt_live_october                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Count: 6                                                                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Latency: 20837.149166666666 ms.                                                                                                                                                                 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Count: 39799                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Latency: 0.45696595391844014 ms.                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Pending Flushes: 0                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Table: nt                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable count: 12                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (live): 15903191275                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (total): 15971044770                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used by snapshots (total): 0                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Off heap memory used (total): 14468424                                                                                                                                                       x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable Compression Ratio: 0.1308103413354315                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Number of keys (estimate): 740                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable cell count: 43483                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable data size: 9272510                                                                                                                                                                  x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable off heap memory used: 0                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable switch count: 17                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read count: 6                                                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read latency: 20837.150 ms                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write count: 39801                                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write latency: 0.457 ms                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Pending flushes: 0                                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false positives: 0                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false ratio: 0.00000                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter space used: 4832                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter off heap memory used: 4736                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Index summary off heap memory used: 576                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compression metadata off heap memory used: 14463112                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition minimum bytes: 6867                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition maximum bytes: 30753941057                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition mean bytes: 44147544                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average tombstones per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum tombstones per slice (last five minutes): 0.0

我正在通过 cqlsh 发出以下查询：

cassandra@cqlsh> TRACING ON;                                                                                                                                                                                          Tracing is already enabled. Use TRACING OFF to disable.                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY;                                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE.                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1;                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds

下面是 system_traces.events 显示的内容：

xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | 1281 | Parsing select * from nt_live_october.nt where group_id='254358'\nand epoch >=1443916800 and epoch<=1444348800\nLIMIT 30;
2604 | Preparing statement
8454 | Executing single-partition query on users
8474 | Acquiring sstable references
8547 | Merging memtable tombstones
8675 | Key cache hit for sstable 1
8685 | Seeking to partition beginning in data file
9040 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
9056 | Merging data from memtables and 1 sstables
9120 | Read 1 live and 0 tombstone cells
9854 | Read-repair DC_LOCAL
10033 | Executing single-partition query on users
10046 | Acquiring sstable references
10105 | Merging memtable tombstones
10189 | Key cache hit for sstable 1
10198 | Seeking to partition beginning in data file
10248 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
10261 | Merging data from memtables and 1 sstables
10296 | Read 1 live and 0 tombstone cells
12511 | Executing single-partition query on nt
12525 | Acquiring sstable references
12587 | Merging memtable tombstones
18067 | speculating read retry on /xxx.xxx.xxx.205
18577 | Sending READ message to xxx.xxx.xxx.205/xxx.xxx.xxx.205
25534 | Partition index with 6093 entries found for sstable 8885
25571 | Seeking to partition indexed section in data file
34989 | Partition index with 5327 entries found for sstable 8524
35022 | Seeking to partition indexed section in data file
36322 | Partition index with 333 entries found for sstable 8477
36336 | Seeking to partition indexed section in data file
714242 | Partition index with 299251 entries found for sstable 8541
714279 | Seeking to partition indexed section in data file
715717 | Partition index with 501 entries found for sstable 8217
715745 | Seeking to partition indexed section in data file
716232 | Partition index with 252 entries found for sstable 8888
716245 | Seeking to partition indexed section in data file
87 | READ message received from /xxx.xxx.xxx.203
50427 | Executing single-partition query on nt
50535 | Acquiring sstable references
50628 | Merging memtable tombstones
170441 | Partition index with 35650 entries found for sstable 6332
30718026 | Partition index with 199905 entries found for sstable 5958
30718077 | Seeking to partition indexed section in data file
170499 | Seeking to partition indexed section in data file
248898 | Partition index with 30958 entries found for sstable 6797
248962 | Seeking to partition indexed section in data file
67814573 | Read timeout: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
67814675 | Timed out; received 0 of 1 responses

我有 4 个节点，复制因子为 3（一个节点很轻，但不是 .203）我尝试读取的数据不是很多——即使没有推送 LIMIT 1到远程节点，间隔的低端应该是大约 3 小时前（我没有超过当前时间的纪元）

关于如何修复 this/what 可能出错的任何提示？我的 cassandra 版本是 2.1.9，运行大部分都是默认值

Table 架构如下（出于隐私原因，我无法发布整个架构，但显示密钥是我希望最重要的事情）

PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

___________EDIT_____________ 回答以下问题：

状态输出：

--  Address         Load       Tokens  Owns    Host ID                               Rack                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN  xxx.xxx.xxx.204  15.8 GB    1       ?       32ed196b-f6eb-4e93-b759  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.205  20.38 GB   1       ?       446d71aa-e9cd-4ca9-a6ac  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.202  1.48 GB    1       ?       2a6670b2-63f2-43be-b672  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.203  15.72 GB   1       ?       dd26dfee-82da-454b-8db2  r1

system.log 比较棘手，因为我在那里登录了很多...我看到的一个可疑的事情是

 WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)

但这只是一个警告...我看到后不久

 INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,].  35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s.  21 total partitions merged to 18.  Partition merge counts were {1:17, 4:1, }

我在日志中看到了很多这样的对...但是没有错误级别的消息。压缩似乎进展顺利..它确实说这是最大的列族，但所有消息都是 INFO 级别....

Answer 1

首先，节点204的DN状态为down。检索它的 system.log 并寻找 :

异常和错误级别日志
异常 GC activity（收集时间超过 200 毫秒）
状态记录器

其次，数据在集群中分布不均。 202 的负载只有 1.48 GB。我怀疑您在其他节点上复制了一些非常大的分区。什么是复制因子？您的密钥空间方案是什么？您可以使用 cqlsh 命令回答这些问题：

DESCRIBE KEYSPACE nt_live_october;

麻烦跟踪卡桑德拉查询

Trouble tracing cassandra query

cql

cassandra