提高性能并了解 MongoDB 游标的 IOPS 和用法

Question

我正在尝试提高使用 MongoDB 的代码的效率和速度。代码是用Python写的，使用了pymongo模块。

目前我的代码中有一个部分接收可能从数据库中删除的值列表并验证哪些值实际被删除：

verified_removed = []
for value in possibly_removed:
    if db.items.find_one({"name" : value}) is None:
        verified_removed.append(value)

现在我知道我可以将其更改为类似的东西：

still_exist = list(db.items.find({"name" : {"$in": possibly_removed}))
verified_removed = [val for val in possibly_removed if val not in still_exist]

但我不确定一件事：
find 方法创建一个可以迭代的游标。但是对于我的每个测试值，游标是否比调用 find_one 更有效？还是我的 IOPS 在这两种情况下都保持不变？

光标究竟是如何工作的？当每 ~1 分钟必须 iterate/update 我的数据库中的许多对象时，提高性能的最佳方法是什么？

Answer 1

find()一次抓取一批，所以在大多数情况下会比多次调用find_one()更有效率。该文档有更多详细信息 https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#cursor-batches

如果您想提高性能，请考虑在您过滤的字段上添加索引。另请查看 bulk operations.

Improving performance and understanding the IOPS and usage of a MongoDB Cursor