Cassandra 查询时间长并在键完全受限时添加到 memtable

Question

我有一个 Cassandra table，按键如下所示：

PRIMARY KEY (("k1", "k2"), "c1", "c2"), ) WITH CLUSTERING ORDER BY ("c1" DESC, "c2" DESC);

当我完全限制一个查询时，它比我遗漏最后一个聚类键花费的时间要长得多。它还会执行一个 "Adding to feed memtable" ，而不受约束的查询则不会。为什么是这样？我以前知道这个查询不会将条目添加到 memtable，因为当东西被添加到 memtable 时我有自定义代码运行ning。此代码应仅在插入或修改内容时运行但在我仅查询项目时开始运行ning。

编辑： 我应该提到两个查询 return 1 行，这是同一条记录。

  activity                                                                                                                                                                          | timestamp                  | source        | source_elapsed | client
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
                                                                                                                                                                 Execute CQL3 query | 2017-09-05 18:09:37.456000 | **.***.**.237 |              0 | ***.**.*.4
                                              Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'; [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 |            267 | ***.**.*.4
                                                                                                                                          Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 |            452 | ***.**.*.4
                                                                                                                     Executing single-partition query on feed [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1253 | ***.**.*.4
                                                                                                                                 Acquiring sstable references [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1312 | ***.**.*.4
                                                                                                                                    Merging memtable contents [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1370 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 22 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           6939 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 21 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7077 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 12 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7137 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 6 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7194 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 3 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7249 | ***.**.*.4
                                                                                                                                 Merging data from sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7362 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7429 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 9 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7489 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 4 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7628 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 7 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7720 | ***.**.*.4
                                                                                                                                 Defragmenting requested data [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7779 | ***.**.*.4
                                                                                                                                      Adding to feed memtable [SharedPool-Worker-4] | 2017-09-05 18:09:37.464000 | **.***.**.237 |           7896 | ***.**.*.4
                                                                                                                            Read 1 live and 4 tombstone cells [SharedPool-Worker-3] | 2017-09-05 18:09:37.464000 | **.***.**.237 |           7932 | ***.**.*.4
                                                                                                                                                                   Request complete | 2017-09-05 18:09:37.464092 | **.***.**.237 |           8092 | ***.**.*.4

activity                                                                                                                                              | timestamp                  | source        | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
                                                                                                                                    Execute CQL3 query | 2017-09-05 18:09:44.703000 | **.***.**.237 |              0 | ***.**.*.4
                                Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z'; [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 |            508 | ***.**.*.4
                                                                                                             Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 |            717 | ***.**.*.4
                                                                                        Executing single-partition query on feed [SharedPool-Worker-2] | 2017-09-05 18:09:44.704000 | **.***.**.237 |           1377 | ***.**.*.4
                                                                                                    Acquiring sstable references [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1499 | ***.**.*.4
                                                                                                    Key cache hit for sstable 10 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1730 | ***.**.*.4
                                                       Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1804 | ***.**.*.4
                                                                                                    Key cache hit for sstable 22 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1858 | ***.**.*.4
                                                                                                    Key cache hit for sstable 21 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1908 | ***.**.*.4
                                                                                                    Key cache hit for sstable 12 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1951 | ***.**.*.4
                                                                                                     Key cache hit for sstable 6 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2002 | ***.**.*.4
                                                                                                     Key cache hit for sstable 3 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2037 | ***.**.*.4
                                                                                       Merged data from memtables and 6 sstables [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2252 | ***.**.*.4
                                                                                               Read 1 live and 4 tombstone cells [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2307 | ***.**.*.4
                                                                                                                                      Request complete | 2017-09-05 18:09:44.705458 | **.***.**.237 |           2458 | ***.**.*.4

cqlsh> show version [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 |
Native protocol v4]

Answer 1

你是在比较苹果和橙子

第一个查询要求所有行匹配条件 k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC' 这里的额外条件是 c2 = 'CCC' 所以 cassandra 需要在符合这些条件的 returning 行中做更多的工作。
在第二个查询中，您放宽了 c2 上的匹配条件，因此您可以看到不同的性能行为。

假设您有 1000 行符合条件 k1 = 'AAA' 和 k2 = 'BBB' 和 c1 = '2017-09-05T16:09:00.222Z'。添加 c2 的条件可能 return 只有 4 行（它可能需要检查 c2 条件的所有行），其中删除条件将在匹配 k1、k2 和 c1 后开始流式传输结果。

如果你真的想比较你可以比较

k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC' OR k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'XXX'

此外，在检查性能时，您需要多次运行同一查询以避免任何缓存行为。

Answer 2

这是一个很好的问题，您已经（很有帮助地）提供了我们需要回答的所有信息！

您的第一个查询是点查找（因为您指定了两个聚类键）。第二个是切片。

如果我们查看轨迹，您的轨迹中的明显差异是：

Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones

这是一个很好的暗示，表明我们正在采用两种不同的读取路径。您可以使用它来编写潜水代码，但长话短说，the filter you use for your point read means you'll query the memtable/sstables in different order - 对于点读取，我们按时间戳排序，对于切片，我们将首先尝试消除不相交的 sstables。

代码中的注释暗示了这一点——第一个：

/**
 * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
 * max timestamp.
 *
 * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
 * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
 * no collection or counters are included).
 * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
 */

第二个：

    /*
     * We have 2 main strategies:
     *   1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
     *      unless we have a names filter that we know we can optimize futher.
     *   2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
     *      will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
     *      have a way to guarantee we have all the data for what is queried, which is only possible for name queries
     *      and if we have neither collections nor counters (indeed, for a collection, we can't guarantee an older sstable
     *      won't have some elements that weren't in the most recent sstables, and counters are intrinsically a collection
     *      of shards so have the same problem).
     */

在您的情况下，如果返回的行恰好在 memtable 中，则第一个（点）读取会更快。此外，由于您有 8 个 sstables，您可能正在使用 STCS 或 TWCS - 如果您使用 LCS，您可能会将该分区压缩为 ~5 个 sstables，并且您（再次）具有更可预测的读取性能。

I know previously this query would not add the entry to the memtable as I have custom code running when things are added to the memtable. This code should only run when things are inserted or modified but started running when I was only querying items.

默认情况下，读取路径都不应该向内存表添加任何内容，除非您正在读取修复（也就是说，除非副本之间的值不匹配，或者触发了后台读取修复机会）。请注意，切片查询比点查询更容易不匹配，因为它是基于扫描的 - 您将读取修复删除标记（墓碑）的 any/all 与 c1 = '2017-09-05T16:09:00.222Z'[ 的匹配值=17=]

编辑：我在跟踪中遗漏了一行：

Defragmenting requested data

这表明您正在使用 STCS 并且触及了太多 sstables，因此整个分区被复制到 memtable 中以便将来读取更快。当您开始接触太多 sstables 时，这是 STCS 中一个鲜为人知的优化，您可以使用 LCS 解决它。

Cassandra 查询时间长并在键完全受限时添加到 memtable

Cassandra long query time and adding to memtable when keys are fully constrained

cassandra

cassandra-3.0