Cassandra 查询时间长并在键完全受限时添加到 memtable
Cassandra long query time and adding to memtable when keys are fully constrained
我有一个 Cassandra table,按键如下所示:
PRIMARY KEY (("k1", "k2"), "c1", "c2"), ) WITH CLUSTERING ORDER BY
("c1" DESC, "c2" DESC);
当我完全限制一个查询时,它比我遗漏最后一个聚类键花费的时间要长得多。它还会执行一个 "Adding to feed memtable" ,而不受约束的查询则不会。为什么是这样?我以前知道这个查询不会将条目添加到 memtable,因为当东西被添加到 memtable 时我有自定义代码 运行ning。此代码应仅在插入或修改内容时 运行 但在我仅查询项目时开始 运行ning。
编辑: 我应该提到两个查询 return 1 行,这是同一条记录。
activity | timestamp | source | source_elapsed | client
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
Execute CQL3 query | 2017-09-05 18:09:37.456000 | **.***.**.237 | 0 | ***.**.*.4
Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'; [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 | 267 | ***.**.*.4
Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 | 452 | ***.**.*.4
Executing single-partition query on feed [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1253 | ***.**.*.4
Acquiring sstable references [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1312 | ***.**.*.4
Merging memtable contents [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1370 | ***.**.*.4
Key cache hit for sstable 22 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 6939 | ***.**.*.4
Key cache hit for sstable 21 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7077 | ***.**.*.4
Key cache hit for sstable 12 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7137 | ***.**.*.4
Key cache hit for sstable 6 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7194 | ***.**.*.4
Key cache hit for sstable 3 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7249 | ***.**.*.4
Merging data from sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7362 | ***.**.*.4
Key cache hit for sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7429 | ***.**.*.4
Key cache hit for sstable 9 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7489 | ***.**.*.4
Key cache hit for sstable 4 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7628 | ***.**.*.4
Key cache hit for sstable 7 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7720 | ***.**.*.4
Defragmenting requested data [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7779 | ***.**.*.4
Adding to feed memtable [SharedPool-Worker-4] | 2017-09-05 18:09:37.464000 | **.***.**.237 | 7896 | ***.**.*.4
Read 1 live and 4 tombstone cells [SharedPool-Worker-3] | 2017-09-05 18:09:37.464000 | **.***.**.237 | 7932 | ***.**.*.4
Request complete | 2017-09-05 18:09:37.464092 | **.***.**.237 | 8092 | ***.**.*.4
activity | timestamp | source | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
Execute CQL3 query | 2017-09-05 18:09:44.703000 | **.***.**.237 | 0 | ***.**.*.4
Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z'; [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 508 | ***.**.*.4
Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 717 | ***.**.*.4
Executing single-partition query on feed [SharedPool-Worker-2] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 1377 | ***.**.*.4
Acquiring sstable references [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1499 | ***.**.*.4
Key cache hit for sstable 10 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1730 | ***.**.*.4
Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1804 | ***.**.*.4
Key cache hit for sstable 22 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1858 | ***.**.*.4
Key cache hit for sstable 21 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1908 | ***.**.*.4
Key cache hit for sstable 12 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1951 | ***.**.*.4
Key cache hit for sstable 6 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2002 | ***.**.*.4
Key cache hit for sstable 3 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2037 | ***.**.*.4
Merged data from memtables and 6 sstables [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2252 | ***.**.*.4
Read 1 live and 4 tombstone cells [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2307 | ***.**.*.4
Request complete | 2017-09-05 18:09:44.705458 | **.***.**.237 | 2458 | ***.**.*.4
cqlsh> show version [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 |
Native protocol v4]
你是在比较苹果和橙子
第一个查询要求所有行匹配条件
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'
这里的额外条件是 c2 = 'CCC' 所以 cassandra 需要在符合这些条件的 returning 行中做更多的工作。
在第二个查询中,您放宽了 c2 上的匹配条件,因此您可以看到不同的性能行为。
假设您有 1000 行符合条件 k1 = 'AAA' 和 k2 = 'BBB' 和 c1 = '2017-09-05T16:09:00.222Z'。
添加 c2 的条件可能 return 只有 4 行(它可能需要检查 c2 条件的所有行),其中删除条件将在匹配 k1、k2 和 c1 后开始流式传输结果。
- 如果你真的想比较你可以比较
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'
OR
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'XXX'
此外,在检查性能时,您需要多次 运行 同一查询以避免任何缓存行为。
这是一个很好的问题,您已经(很有帮助地)提供了我们需要回答的所有信息!
您的第一个查询是点查找(因为您指定了两个聚类键)。第二个是切片。
如果我们查看轨迹,您的轨迹中的明显差异是:
Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones
这是一个很好的暗示,表明我们正在采用两种不同的读取路径。您可以使用它来编写潜水代码,但长话短说,the filter you use for your point read means you'll query the memtable/sstables in different order - 对于点读取,我们按时间戳排序,对于切片,我们将首先尝试消除不相交的 sstables。
代码中的注释暗示了这一点——第一个:
/**
* Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
* max timestamp.
*
* This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
* more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
* no collection or counters are included).
* This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
*/
第二个:
/*
* We have 2 main strategies:
* 1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
* unless we have a names filter that we know we can optimize futher.
* 2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
* will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
* have a way to guarantee we have all the data for what is queried, which is only possible for name queries
* and if we have neither collections nor counters (indeed, for a collection, we can't guarantee an older sstable
* won't have some elements that weren't in the most recent sstables, and counters are intrinsically a collection
* of shards so have the same problem).
*/
在您的情况下,如果返回的行恰好在 memtable 中,则第一个(点)读取会更快。此外,由于您有 8 个 sstables,您可能正在使用 STCS 或 TWCS - 如果您使用 LCS,您可能会将该分区压缩为 ~5 个 sstables,并且您(再次)具有更可预测的读取性能。
I know previously this query would not add the entry to the memtable as I have custom code running when things are added to the memtable. This code should only run when things are inserted or modified but started running when I was only querying items.
默认情况下,读取路径都不应该向内存表添加任何内容,除非您正在读取修复(也就是说,除非副本之间的值不匹配,或者触发了后台读取修复机会)。请注意,切片查询比点查询更容易不匹配,因为它是基于扫描的 - 您将读取修复删除标记(墓碑)的 any/all 与 c1 = '2017-09-05T16:09:00.222Z'
[ 的匹配值=17=]
编辑:我在跟踪中遗漏了一行:
Defragmenting requested data
这表明您正在使用 STCS 并且触及了太多 sstables,因此整个分区被复制到 memtable 中以便将来读取更快。当您开始接触太多 sstables 时,这是 STCS 中一个鲜为人知的优化,您可以使用 LCS 解决它。
我有一个 Cassandra table,按键如下所示:
PRIMARY KEY (("k1", "k2"), "c1", "c2"), ) WITH CLUSTERING ORDER BY ("c1" DESC, "c2" DESC);
当我完全限制一个查询时,它比我遗漏最后一个聚类键花费的时间要长得多。它还会执行一个 "Adding to feed memtable" ,而不受约束的查询则不会。为什么是这样?我以前知道这个查询不会将条目添加到 memtable,因为当东西被添加到 memtable 时我有自定义代码 运行ning。此代码应仅在插入或修改内容时 运行 但在我仅查询项目时开始 运行ning。
编辑: 我应该提到两个查询 return 1 行,这是同一条记录。
activity | timestamp | source | source_elapsed | client
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
Execute CQL3 query | 2017-09-05 18:09:37.456000 | **.***.**.237 | 0 | ***.**.*.4
Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'; [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 | 267 | ***.**.*.4
Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 | 452 | ***.**.*.4
Executing single-partition query on feed [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1253 | ***.**.*.4
Acquiring sstable references [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1312 | ***.**.*.4
Merging memtable contents [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 | 1370 | ***.**.*.4
Key cache hit for sstable 22 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 6939 | ***.**.*.4
Key cache hit for sstable 21 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7077 | ***.**.*.4
Key cache hit for sstable 12 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7137 | ***.**.*.4
Key cache hit for sstable 6 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7194 | ***.**.*.4
Key cache hit for sstable 3 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7249 | ***.**.*.4
Merging data from sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 | 7362 | ***.**.*.4
Key cache hit for sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7429 | ***.**.*.4
Key cache hit for sstable 9 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7489 | ***.**.*.4
Key cache hit for sstable 4 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7628 | ***.**.*.4
Key cache hit for sstable 7 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7720 | ***.**.*.4
Defragmenting requested data [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 | 7779 | ***.**.*.4
Adding to feed memtable [SharedPool-Worker-4] | 2017-09-05 18:09:37.464000 | **.***.**.237 | 7896 | ***.**.*.4
Read 1 live and 4 tombstone cells [SharedPool-Worker-3] | 2017-09-05 18:09:37.464000 | **.***.**.237 | 7932 | ***.**.*.4
Request complete | 2017-09-05 18:09:37.464092 | **.***.**.237 | 8092 | ***.**.*.4
activity | timestamp | source | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
Execute CQL3 query | 2017-09-05 18:09:44.703000 | **.***.**.237 | 0 | ***.**.*.4
Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z'; [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 508 | ***.**.*.4
Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 717 | ***.**.*.4
Executing single-partition query on feed [SharedPool-Worker-2] | 2017-09-05 18:09:44.704000 | **.***.**.237 | 1377 | ***.**.*.4
Acquiring sstable references [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1499 | ***.**.*.4
Key cache hit for sstable 10 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1730 | ***.**.*.4
Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1804 | ***.**.*.4
Key cache hit for sstable 22 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1858 | ***.**.*.4
Key cache hit for sstable 21 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1908 | ***.**.*.4
Key cache hit for sstable 12 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 | 1951 | ***.**.*.4
Key cache hit for sstable 6 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2002 | ***.**.*.4
Key cache hit for sstable 3 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2037 | ***.**.*.4
Merged data from memtables and 6 sstables [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2252 | ***.**.*.4
Read 1 live and 4 tombstone cells [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 | 2307 | ***.**.*.4
Request complete | 2017-09-05 18:09:44.705458 | **.***.**.237 | 2458 | ***.**.*.4
cqlsh> show version [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
你是在比较苹果和橙子
第一个查询要求所有行匹配条件
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'
这里的额外条件是 c2 = 'CCC' 所以 cassandra 需要在符合这些条件的 returning 行中做更多的工作。在第二个查询中,您放宽了 c2 上的匹配条件,因此您可以看到不同的性能行为。
假设您有 1000 行符合条件 k1 = 'AAA' 和 k2 = 'BBB' 和 c1 = '2017-09-05T16:09:00.222Z'。 添加 c2 的条件可能 return 只有 4 行(它可能需要检查 c2 条件的所有行),其中删除条件将在匹配 k1、k2 和 c1 后开始流式传输结果。
- 如果你真的想比较你可以比较
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'
OR
k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'XXX'
此外,在检查性能时,您需要多次 运行 同一查询以避免任何缓存行为。
这是一个很好的问题,您已经(很有帮助地)提供了我们需要回答的所有信息!
您的第一个查询是点查找(因为您指定了两个聚类键)。第二个是切片。
如果我们查看轨迹,您的轨迹中的明显差异是:
Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones
这是一个很好的暗示,表明我们正在采用两种不同的读取路径。您可以使用它来编写潜水代码,但长话短说,the filter you use for your point read means you'll query the memtable/sstables in different order - 对于点读取,我们按时间戳排序,对于切片,我们将首先尝试消除不相交的 sstables。
代码中的注释暗示了这一点——第一个:
/**
* Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
* max timestamp.
*
* This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
* more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
* no collection or counters are included).
* This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
*/
第二个:
/*
* We have 2 main strategies:
* 1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
* unless we have a names filter that we know we can optimize futher.
* 2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
* will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
* have a way to guarantee we have all the data for what is queried, which is only possible for name queries
* and if we have neither collections nor counters (indeed, for a collection, we can't guarantee an older sstable
* won't have some elements that weren't in the most recent sstables, and counters are intrinsically a collection
* of shards so have the same problem).
*/
在您的情况下,如果返回的行恰好在 memtable 中,则第一个(点)读取会更快。此外,由于您有 8 个 sstables,您可能正在使用 STCS 或 TWCS - 如果您使用 LCS,您可能会将该分区压缩为 ~5 个 sstables,并且您(再次)具有更可预测的读取性能。
I know previously this query would not add the entry to the memtable as I have custom code running when things are added to the memtable. This code should only run when things are inserted or modified but started running when I was only querying items.
默认情况下,读取路径都不应该向内存表添加任何内容,除非您正在读取修复(也就是说,除非副本之间的值不匹配,或者触发了后台读取修复机会)。请注意,切片查询比点查询更容易不匹配,因为它是基于扫描的 - 您将读取修复删除标记(墓碑)的 any/all 与 c1 = '2017-09-05T16:09:00.222Z'
[ 的匹配值=17=]
编辑:我在跟踪中遗漏了一行:
Defragmenting requested data
这表明您正在使用 STCS 并且触及了太多 sstables,因此整个分区被复制到 memtable 中以便将来读取更快。当您开始接触太多 sstables 时,这是 STCS 中一个鲜为人知的优化,您可以使用 LCS 解决它。