WiredTiger 和就地更新

WiredTiger and in-place updates

我有一组用户。每个用户都有一个经常更新的字段“geoposition”（每次用户大幅移动时）。由于更新时我希望在文档级别而不是集合级别并发，因此我使用的是 WiredTiger 存储引擎。

我了解到，使用 WiredTiger，文档中的每次更新都会导致创建一个新文档：

http://learnmongodbthehardway.com/schema/wiredtiger/

WiredTiger does not support in place updates

然而，这篇文章还说“即使 [WiredTiger] 不允许就地更新，它在许多工作负载下的性能仍优于 MMAP”。这是什么意思？使用 WiredTiger 时我必须注意的确切含义是什么？例如，如果没有就地更新，数据库大小会快速增长吗？还有其他需要注意的地方吗？

我还了解到 MongoDB 3.6 中的 WiredTiger 添加了存储增量的功能，而不是重写整个文档 (https://jira.mongodb.org/browse/DOCS-11416)。这到底是什么意思？

注意： 另外我不明白的是，现在大多数（如果不是全部）硬盘驱动器的扇区大小为 4096 字节，因此您无法写入硬盘仅驱动 4 个字节（例如），但必须写入 4096 字节的完整块（因此先读取它，更新其中的 4 个字节，然后再写入）。由于大多数文档通常小于 4096 字节，这是否意味着在任何情况下都需要重写整个文档（即使使用 MMAP）。我错过了什么？

使用旧版 MMAPv1 存储引擎（在 MongoDB 4.2 中删除），in-place 更新经常被强调为优化策略，因为文档索引直接指向文件位置和偏移量。将文档移动到新的存储位置（特别是如果有许多索引条目要更新）对 MMAPv1 的开销比 in-place 更新要大，后者只需要更新更改的字段。

WiredTiger 不支持 in-place 更新，因为它在内部使用 MVCC (Multiversion concurrency control), which is commonly used by database management systems. This is a significant technical improvement over the simplistic view in MMAP, and allows for building more advanced features like isolation levels 和事务。 WiredTiger 的索引具有一定程度的间接性（引用内部 RecordID 而不是文件位置和偏移量），因此存储级别的文档移动不是一个重大痛点。

However, this article also says that "Even though [WiredTiger] does not allow for in-place updates, it could still perform better than MMAP for many workloads".

这意味着尽管 MMAPv1 可能有更高效的 in-place 更新路径，但 WiredTiger 具有其他优势，例如压缩和改进的并发控制。您或许可以构建一个仅包含 in-place 更新的几个文档的工作负载，这可能在 MMAPv1 中表现更好，但实际工作负载通常更加多样化。确认给定工作负载影响的唯一方法是在代表性环境中进行测试。

但是，如果您想为未来做计划，MMAPv1 与 WiredTiger 的一般选择是没有实际意义的：WiredTiger 自 MongoDB 3.2 以来一直是默认存储引擎，MMAPv1 不支持一些较新的产品功能。例如，MMAPv1 不支持 Majority Read Concern which in turn means it cannot be used for Replica Set Config Servers (required for sharding in MongoDB 3.4+) or Change Streams (MongoDB 3.6+). MMAPv1 was deprecated in MongoDB 4.0 and removed in MongoDB 4.2.

What are the exact implications that I must be aware of when I use WiredTiger? For example, without in-place updates will the database size grow quickly?

存储结果取决于多个因素，包括架构设计、工作负载、配置和 MongoDB 服务器的版本。 MMAPv1 和 WiredTiger 使用不同的记录分配策略，但两者都会尝试使用标记为 free/reusable 的预分配 space。总的来说，WiredTiger 在使用存储 space 时效率更高，而且它还具有数据和索引压缩的优势。 MMAPv1 allocates additional storage space 尝试优化 in-place 更新并避免文档移动，尽管您可以为工作负载不会随时间改变文档大小的集合选择“无填充”策略。

自从在 MongoDB 3.0 中首次引入 WiredTiger 以来，已针对不同的工作负载改进和调整 WiredTiger 进行了大量投资，因此我强烈建议使用最新的生产版本系列进行测试以获得最佳结果。如果您有关于架构设计和存储增长的具体问题，我建议您在 DBA StackExchange 上发布详细信息以供讨论。

I also learned that WiredTiger in MongoDB 3.6 added the capability to store deltas rather than re-writing the entire document (https://jira.mongodb.org/browse/DOCS-11416). What does this mean, exactly?

这是一个实现细节，可以针对某些用例改进 WiredTiger 的内部数据结构。特别是，MongoDB 3.6+ 中的 WiredTiger 可以更有效地处理对大型文档的小改动（与以前的版本相比）。 WiredTiger 缓存需要能够 return 多个版本的文档，只要它们被开放的内部会话（MVCC，如前所述）使用，因此对于具有小更新的大文档，存储一个更有效增量列表。但是，如果累积了太多增量（或者增量正在更改文档中的大部分字段），则此方法的性能可能不如维护完整文档的多个副本。

当数据通过检查点提交到磁盘时，仍需要写入文档的完整版本。如果您想了解有关某些内部结构的更多信息，可以观看 MongoDB Path To Transactions 系列视频，了解 MongoDB 4.0 中支持 multi-document 交易的功能开发。

Also what I don't understand is that nowadays most (if not all) hard drives have a sector size of 4096 bytes, so you cannot write to the hard drive only 4 bytes (for example) but instead must write the full block of 4096 bytes (so read it first, update the 4 bytes in it and then write it). As most document are often < 4096 bytes does this mean that re-writing the whole document is necessary in any case (even with MMAP). What did I miss?

在不深入实施细节和试图解释所有涉及的移动部分的情况下，考虑不同的方法如何应用于正在更新许多文档（而不是在单个文档级别）的工作负载以及影响关于内存使用情况（在将文档写入磁盘之前）。根据文档大小和压缩等因素，I/O 的单个块可以表示从文档的一小部分（最大大小 16MB）到多个文档的任何位置。

在 MongoDB 中，一般流程是文档在 in-memory 视图（例如 WiredTiger 缓存）中更新，更改以快速 append-only 日志格式保存到磁盘在 periodically flushed to the data files 之前。如果 O/S 只需写入已更改的数据块，则接触较少的数据块需要较少的总体 I/O。