Cassandra读取进程
Cassandra read process
比如说,我有一个 table,有 4 列。我在里面写了一些数据。如果我尝试读取数据,过程会像 this。我想了解一个特定的场景,在这个场景中,所有列(我正在尝试读取的行)都存在于 memtable 中。是否会检查 SSTables 中是否有此类行的数据?我认为,在这种情况下,没有必要检查 SSTables,因为显然 memtable 中的数据将是最新的副本。因此,在这种情况下,与 memtable 没有行或仅包含部分数据时相比,读取速度应该更快。
我创建了一个 table(user_data),并输入了一些数据,这导致创建了 2 个 SSTable。在此之后,我插入了一个新行。我检查了数据目录并确保 SSTable 计数仍然是 2。这意味着我输入的新数据位于 Memtable 中。我在 cqlsh 中设置了 'tracing on',然后选择了同一行。下面给出的是输出:
Tracing session: de2e8ce0-cf1e-11e6-9318-a131a78ce29a
activity | timestamp | source | source_elapsed | client
----------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+---------------
Execute CQL3 query | 2016-12-31 11:33:36.494000 | 172.16.129.67 | 0 | 172.16.129.67
Parsing select address,age from user_data where name='Kishan'; [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 182 | 172.16.129.67
Preparing statement [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 340 | 172.16.129.67
Executing single-partition query on user_data [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 693 | 172.16.129.67
Acquiring sstable references [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 765 | 172.16.129.67
Merging memtable contents [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 821 | 172.16.129.67
Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 1028 | 172.16.129.67
Request complete | 2016-12-31 11:33:36.495225 | 172.16.129.67 | 1225 | 172.16.129.67
这里的"Acquiring sstable references"我没看懂是什么意思。由于完整的数据位于 Memtable 中,因此,据我了解,无需检查 SSTables。那么,这些参考资料到底有什么用呢?
all the columns(of the row which I'm trying to read) are present in the memtable.Will SSTables, be checked for data for such a row?
在这种特殊情况下,它还会并行检查 memtable 中的 sstable 数据。
只会去sstable(其实先在row-cache,然后bloom filter然后sstable),对于该列,内存表中不存在。
编辑:
要了解有关读取过程如何工作的更多信息,请深入了解 cassandra 源代码。让我们从跟踪日志开始,我们将逐行执行这些步骤:
让我们从这里开始:
Executing single-partition query on user_data [ReadStage-2]
您的 select 查询是一个单分区行查询,这很明显。 Cassandra 只需要从单个分区读取数据。我们跳转到对应的方法和java-doc这里,就是self-explained:
/**
* Queries both memtable and sstables to fetch the result of this query.
* <p>
* Please note that this method:
* 1) does not check the row cache.
* 2) does not apply the query limit, nor the row filter (and so ignore 2ndary indexes).
* Those are applied in {@link ReadCommand#executeLocally}.
* 3) does not record some of the read metrics (latency, scanned cells histograms) nor
* throws TombstoneOverwhelmingException.
* It is publicly exposed because there is a few places where that is exactly what we want,
* but it should be used only where you know you don't need thoses things.
* <p>
* Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as
* a parameter to enforce that fact, even though it's not explicitlly used by the method.
*/
public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController)
{
assert executionController != null && executionController.validForReadOn(cfs);
Tracing.trace("Executing single-partition query on {}", cfs.name);
return queryMemtableAndDiskInternal(cfs);
}
从 avobe 步骤我们发现,对于您的查询,它将调用 queryMemtableAndDiskInternal(cfs);
这个方法:
private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs)
{
/*
* We have 2 main strategies:
* 1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
* unless we have a names filter that we know we can optimize futher.
* 2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
* will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
* have a way to guarantee we have all the data for what is queried, which is only possible for name queries
* and if we have neither non-frozen collections/UDTs nor counters (indeed, for a non-frozen collection or UDT,
* we can't guarantee an older sstable won't have some elements that weren't in the most recent sstables,
* and counters are intrinsically a collection of shards and so have the same problem).
*/
if (clusteringIndexFilter() instanceof ClusteringIndexNamesFilter && !queriesMulticellType())
return queryMemtableAndSSTablesInTimestampOrder(cfs, (ClusteringIndexNamesFilter)clusteringIndexFilter());
...
...
我们从这条评论中找到了答案:
We have 2 main strategies:
1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use........
Cassandra 同时查询 memtables 和 sstables。
之后,如果我们跳转到 queryMemtableAndSSTablesInTimestampOrder
方法,我们会发现:
/**
* Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
* max timestamp.
*
* This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
* more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
* no collection or counters are included).
* This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
*/
private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
{
Tracing.trace("Acquiring sstable references");
ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));
ImmutableBTreePartition result = null;
Tracing.trace("Merging memtable contents");
.... // then it also looks into sstable on timestamp order.
从上面的部分我们已经找到了最后两个跟踪日志:
Acquiring sstable references [ReadStage-2]
Merging memtable contents [ReadStage-2]
希望这对您有所帮助。
比如说,我有一个 table,有 4 列。我在里面写了一些数据。如果我尝试读取数据,过程会像 this。我想了解一个特定的场景,在这个场景中,所有列(我正在尝试读取的行)都存在于 memtable 中。是否会检查 SSTables 中是否有此类行的数据?我认为,在这种情况下,没有必要检查 SSTables,因为显然 memtable 中的数据将是最新的副本。因此,在这种情况下,与 memtable 没有行或仅包含部分数据时相比,读取速度应该更快。
我创建了一个 table(user_data),并输入了一些数据,这导致创建了 2 个 SSTable。在此之后,我插入了一个新行。我检查了数据目录并确保 SSTable 计数仍然是 2。这意味着我输入的新数据位于 Memtable 中。我在 cqlsh 中设置了 'tracing on',然后选择了同一行。下面给出的是输出:
Tracing session: de2e8ce0-cf1e-11e6-9318-a131a78ce29a
activity | timestamp | source | source_elapsed | client
----------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+---------------
Execute CQL3 query | 2016-12-31 11:33:36.494000 | 172.16.129.67 | 0 | 172.16.129.67
Parsing select address,age from user_data where name='Kishan'; [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 182 | 172.16.129.67
Preparing statement [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 340 | 172.16.129.67
Executing single-partition query on user_data [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 693 | 172.16.129.67
Acquiring sstable references [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 765 | 172.16.129.67
Merging memtable contents [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 821 | 172.16.129.67
Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 | 1028 | 172.16.129.67
Request complete | 2016-12-31 11:33:36.495225 | 172.16.129.67 | 1225 | 172.16.129.67
这里的"Acquiring sstable references"我没看懂是什么意思。由于完整的数据位于 Memtable 中,因此,据我了解,无需检查 SSTables。那么,这些参考资料到底有什么用呢?
all the columns(of the row which I'm trying to read) are present in the memtable.Will SSTables, be checked for data for such a row?
在这种特殊情况下,它还会并行检查 memtable 中的 sstable 数据。
只会去sstable(其实先在row-cache,然后bloom filter然后sstable),对于该列,内存表中不存在。
编辑:
要了解有关读取过程如何工作的更多信息,请深入了解 cassandra 源代码。让我们从跟踪日志开始,我们将逐行执行这些步骤:
让我们从这里开始:
Executing single-partition query on user_data [ReadStage-2]
您的 select 查询是一个单分区行查询,这很明显。 Cassandra 只需要从单个分区读取数据。我们跳转到对应的方法和java-doc这里,就是self-explained:
/**
* Queries both memtable and sstables to fetch the result of this query.
* <p>
* Please note that this method:
* 1) does not check the row cache.
* 2) does not apply the query limit, nor the row filter (and so ignore 2ndary indexes).
* Those are applied in {@link ReadCommand#executeLocally}.
* 3) does not record some of the read metrics (latency, scanned cells histograms) nor
* throws TombstoneOverwhelmingException.
* It is publicly exposed because there is a few places where that is exactly what we want,
* but it should be used only where you know you don't need thoses things.
* <p>
* Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as
* a parameter to enforce that fact, even though it's not explicitlly used by the method.
*/
public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController)
{
assert executionController != null && executionController.validForReadOn(cfs);
Tracing.trace("Executing single-partition query on {}", cfs.name);
return queryMemtableAndDiskInternal(cfs);
}
从 avobe 步骤我们发现,对于您的查询,它将调用 queryMemtableAndDiskInternal(cfs);
这个方法:
private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs)
{
/*
* We have 2 main strategies:
* 1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
* unless we have a names filter that we know we can optimize futher.
* 2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
* will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
* have a way to guarantee we have all the data for what is queried, which is only possible for name queries
* and if we have neither non-frozen collections/UDTs nor counters (indeed, for a non-frozen collection or UDT,
* we can't guarantee an older sstable won't have some elements that weren't in the most recent sstables,
* and counters are intrinsically a collection of shards and so have the same problem).
*/
if (clusteringIndexFilter() instanceof ClusteringIndexNamesFilter && !queriesMulticellType())
return queryMemtableAndSSTablesInTimestampOrder(cfs, (ClusteringIndexNamesFilter)clusteringIndexFilter());
...
...
我们从这条评论中找到了答案:
We have 2 main strategies:
1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use........
Cassandra 同时查询 memtables 和 sstables。
之后,如果我们跳转到 queryMemtableAndSSTablesInTimestampOrder
方法,我们会发现:
/**
* Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
* max timestamp.
*
* This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
* more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
* no collection or counters are included).
* This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
*/
private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
{
Tracing.trace("Acquiring sstable references");
ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));
ImmutableBTreePartition result = null;
Tracing.trace("Merging memtable contents");
.... // then it also looks into sstable on timestamp order.
从上面的部分我们已经找到了最后两个跟踪日志:
Acquiring sstable references [ReadStage-2]
Merging memtable contents [ReadStage-2]
希望这对您有所帮助。