在 Mongo 中构建倒排列表的更快方法

Question

我想在我的 MongoDB collection 中构建反向列表。 Collections 看起来像这样：

{ "word" : 2, "docToPos" : { "1" : [ 0 ] } }
{ "word" : 5, "docToPos" : { "1" : [ 1 ] } }
{ "word" : 1, "docToPos" : { "1" : [ 2 ], "2" : [ 1 ] } }
{ "word" : 9, "docToPos" : { "2" : [ 2, 43, 1246 ] } }

word 是字典中的某个 id，docToPos 是映射 document 到 position - 例如，单词 2 在文档 1 中的位置 1，单词 9 是在文档 2 中的位置 2、43 和 1246。

我想添加到数据库中的每个新文档都只是一个包含单词 id 的数组：

[23, 43, 75, 18, ... ]

所以使用 spring-mongo 我有这个 java-code:

for (int i=0; i < array.length; i++) {
  invertedListDao.upsert(array[i], documentId, i);
}

（upsert方法是我实现的）

此解决方案有效，但如果文档有 100 000 个单词，则需要 100 000 个查询 mongo。

所以最后，我的问题是：thera 有没有一种方法可以更快地做到这一点？例如：一次查询整个数组并在数据库中执行？我知道 mongo 中有 eval 函数，但 mongo-spring

中没有

Answer 1

提高性能的一种方法是使用 bulk upserts。

var bulk = db.invertedListDao.initializeUnorderedBulkOp();
for (var i=0; i < array.length; i++){
  bulk.find({...}).upsert().replaceOne({...})
}
bulk.execute();

my answer here 中概述了它更高效的原因以及您可以期待什么样的速度提升，但基本上您将 只调用 到 mongo不管你有多少字。

我不熟悉 java spring mongo，但我的初步搜索 suggests that it is supported 我希望你能找到如何实现批量更新插入你的 java driver.

P.S. 在 Bartektartanus 的帮助下，这里是 link to official documentation.

在 Mongo 中构建倒排列表的更快方法

Faster way to build inverted list in Mongo

java

upsert

mongodb

spring-mongo