Mongodb 来自多个进程的查询；如何实现原子性？

Question

我有一个 mongodb 数据库，其中有多个节点进程读写文档。我想知道我怎样才能做到这一点，所以一次只有一个进程可以处理一个文档。（某种锁定）在进程完成更新该条目后释放。

我的应用程序应该执行以下操作：

使用光标逐一浏览每个条目。
（锁定条目以便其他进程无法使用它）
从第三方站点获取信息。
计算新信息并更新条目。
（解锁文档）

此外，在解锁文档后，几个小时内不需要其他进程更新它。

稍后我想设置多个 mongodb 集群，这样我就可以减少数据库的负载。因此该解决方案应该适用于单个和多个数据库服务器。或者至少使用多个 mongo 服务器。

Answer 1

根据 MongoDB documentation： isolated: Prevents a write operation that affects multiple documents from yielding to other reads or writes once the first document is written... $isolated operator causes write operations to acquire an exclusive lock on the collection...will make WiredTiger single-threaded for the duration of the operation. 所以如果你正在更新多个文档，你可以先从第三方 API 获取数据，将信息解析成一个数组，然后在 Mongo shell:

db.foo.update(
    { status : "A" , $isolated : 1 },
    { $set: { < your key >: < your info >}}, //use the info in your array
    { multi: true }
)

或者如果您必须逐一更新文档，您可以使用 MongoDB 节点驱动程序的 findAndModify() 或 updateOne()。请注意，每 MongoDB documentation 'When modifying a single document, both findAndModify() and the update() method atomically update the document.'

一个一个一个更新的例子：首先你connect to the Mongod with the NodeJS driver, then connect to the third part API using NodeJS's Request module, for example, get and parse the data, before using the data to modify your documents，如下所示：

var request = require('request');

var MongoClient = require('mongodb').MongoClient,
    test = require('assert');

MongoClient.connect('mongodb://localhost:27017/test', function(err, db) {
    var collection = db.collection('simple_query');
    collection.find().forEach(
        function(doc) {
            request('http://www.google.com', function(error, response, body) {
                console.log('body:', body); // parse body for your info
                collection.findAndModify({
                    <query based on your doc>
                }, {
                    $set: { < your key >: < your info >
                    }
                })
            });
        }, function(err) {
        });
    });

Answer 2

一个不涉及锁的优雅解决方案是：

在您的文档中添加 version 属性。
更新文档时，增加version属性。
更新文档时，在查找查询中包含最后一次阅读 version。如果您的文档已在别处更新，查找查询将不会产生任何结果并且您的更新将失败。
如果更新失败，您可以重试操作。

我过去使用这种模式取得了巨大的成功。

例子

假设您有一个文档 {_id: 123, version: 1}。

想象一下，现在您有 3 个 Mongo 客户同时执行 db.collection.findAndModify({ query: {_id: 123, version: 1}, update: { $inc: 1 }});。

第一个更新将应用，其余的将失败。为什么？因为现在 version 是 2，而 query 包括 version: 1。

Answer 3

今天遇到这个问题，我觉得它一直开着，

首先，findAndModify 看起来确实是解决问题的方法，但是，我在建议的两个答案中都发现了漏洞：

在 Treefish Zhang 的回答中 - 如果您运行并行处理多个进程，它们将查询相同的文档，因为一开始您使用“查找”而不是“findAndModify”，你只在进程完成后使用“findAndModify”——在处理过程中它仍然没有更新，其他进程也可以查询它。

在 arboreal84 的回答中 - 如果进程在处理条目的过程中崩溃会怎样？如果你在查询的时候更新了版本，那么进程就崩溃了，你不知道操作是否成功。

因此，我认为最可靠的方法是拥有多个字段：

版本
锁定：[true/false],
lockedAt:[timestamp]（可选 - 如果进程崩溃且无法解锁，您可能需要在 x 时间后重试）
attempts:0（可选 - 如果您想知道完成了多少次进程尝试，请增加此值，以便计算重试次数）

然后，对于您的代码：

findAndModify：其中 version=oldVersion 和 locked=false，设置 locked=true，lockedAt=now
处理条目
如果处理成功，设置locked=false，version=newVersion
如果进程失败，设置locked=false
可选：对于 ttl 之后的重试，您还可以通过“or locked=true and lockedAt

大约：

i have a vps in new york and one in hong kong and i would like to apply the lock on both database servers. So those two vps servers wont perform the same task at any chance.

我认为这个问题的答案取决于您为什么需要 2 个数据库服务器以及为什么它们具有相同的条目，

如果其中一个是 cross-region 副本中的次要副本以实现高可用性，findAndModify 将查询 primary since writing to secondary replica is not allowed 这就是为什么您无需担心 2 个服务器处于同步（它可能有延迟问题，但无论如何你都会有它，因为你在 2 个区域之间通信）。

如果你只想要 sharding and horizontal scaling，则无需担心，因为每个分片将包含不同的条目，因此条目锁仅与一个分片相关。

希望对以后的人有所帮助

Mongodb 来自多个进程的查询；如何实现原子性？

Mongodb queries from multiple processes; how to implement atomicity?

atomic

mongoose

mongodb

node.js

例子