限制对游标时间聚合没有影响 mongoDB

Question

我正在汇总具有 100 万条记录的集合中的数据。匹配查询使用索引。在下面查找代码参考 -

    AggregateIterable<Document> aggregateIterable = timeCollection.aggregate(Arrays.asList(match, project,group)).batchSize(1000).allowDiskUse(true);
    long curStartTs = Calendar.getInstance().getTimeInMillis();
    MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line roughly takes 15 seconds
    long curEndTs = Calendar.getInstance().getTimeInMillis();
    System.out.println("Cursor time - " + (curEndTs - curStartTs));

最终结果列表包含 3500 条记录。

现在我通过在聚合管道中传递 $limit 来限制记录 -

    Document limitParam = new Document("$limit",30);
    AggregateIterable<Document> aggregateIterable = timeCollection.aggregate(Arrays.asList(match, project,group,limitParam)).batchSize(1000).allowDiskUse(true);
    long curStartTs = Calendar.getInstance().getTimeInMillis();
    MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line still taking around 15 seconds
    long curEndTs = Calendar.getInstance().getTimeInMillis();
    System.out.println("Cursor time - " + (curEndTs - curStartTs));

最终结果列表现在只包含 30 条记录。

我无法理解为什么在两种情况下没有时间变化。即使在管道中提供了限制，为什么 aggregateIterable.iterator() 与管道中没有限制的情况花费的时间相同？

非常感谢。

亲切的问候，

维巴夫

Answer 1

Aggregation $limit对其传递的文件内容没有影响

通过查看您的代码

long curStartTs = Calendar.getInstance().getTimeInMillis();
MongoCursor<Document> cursor = aggregateIterable.iterator(); //this line roughly takes 15 seconds
long curEndTs = Calendar.getInstance().getTimeInMillis();
System.out.println("Cursor time - " + (curEndTs - curStartTs));

您正在尝试查找执行查询所花费的时间。

为了更好地了解 MongoDB 执行这些查询实际花费了多少时间，我们可以在 mongo shell 和 explain

示例查询

Without Limit

db.foo.aggregate([ { 'conditions' }], {explain: true})

With Limit

db.foo.aggregate([{ 'conditions' }, {$limit: 10}], {explain: true})

您可能还需要查看 Performance of MongoDB query , Optimize Query, Analyze Query Plan and cursor limit

希望对您有所帮助！

限制对游标时间聚合没有影响 mongoDB

no affect of limit on cursor time aggrgate mongoDB

limit

mongodb

mongodb-query

aggregation-framework

mongodb-aggregation