尽管 SSTables 是不变的,但关于排序保证

Regarding sortedness guarantees despite immutability of SSTables

我正在阅读 Martin Kleppmann 在 Designing Data-Intensive Applications 中的 LSM 索引。

作者声明:

When a write comes in, add it to an in-memory balanced tree data structure (for example, a red-black tree). This in-memory tree is sometimes called a memtable.
When the memtable gets bigger than some threshold—typically a few megabytes —write it out to disk as an SSTable file. This can be done efficiently because the tree already maintains the key-value pairs sorted by key. The new SSTable file becomes the most recent segment of the database. While the SSTable is being written out to disk, writes can continue to a new memtable instance.
In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc.
From time to time, run a merging and compaction process in the background to combine segment files and to discard overwritten or deleted values.

我的问题是:鉴于磁盘上的 SSTable 是不可变的,当新数据进入时如何保证排序,这会改变 SSTable 中数据的顺序(不是内存中的 memtable)?

例如,假设我们在磁盘上有一个 SSTable,它有像 [{1:a},{3:c},{4,d}] 这样的键值对。内存中的 Memtable 包含 [{5,e},{6,f}](使用 AVL/RB 树排序)。假设我们现在得到一个新条目:[{2,b}],它应该位于 [{1:a}][{3:c}] 之间。如果磁盘上的 SSTable(s) 是不可变的,将如何处理?理论上,我们可以使用 [{2,b}] 创建一个新的 SSTable,然后压缩可以稍后合并它们,但是在 压缩发生之前我们执行 不会中断 range-queries/reads ?

谢谢!

如果有新数据出现,它们将登陆新的 SSTable,而不是修改现有的。分别读取每个SSTable,然后从所有SSTable和memtable中合并数据,然后在发送前按正确顺序放入内存中。例如,请参阅 this doc,了解如何读取数据。