正确使用 allow_disk_usage 和 pymongo

correcly using allow_disk_usage with pymongo

我有一个对象 运行 适用于较小的集合,但我的集合很大,有超过 700 万份文档。我实际上是在尝试按键分组,key1 和 key2

def groupByThreeItems(self, db=None, col=None, key=None, key1=None, key2=None):
    coll=self.client_()[db][col]
    
    agg_result= coll.aggregate([{
       '$group':
         {'_id': { key: "$"+key, key1: "$"+key1},
           key2: { "$push":  "$"+key2 }, "Count":{"$sum": 1}
         }}],{'allow_disk_use': True})
    return [i for i in agg_result]

我收到以下错误

AttributeError: 'dict' object has no attribute '_txn_read_preference'

但是,当我不使用 allow_disk_use 时,出现以下错误。

pymongo.errors.OperationFailure: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in., full error: {'ok': 0.0, 'errmsg': "Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.", 'code': 292, 'codeName': 'QueryExceededMemoryLimitNoDiskUseAllowed'}

如何解决磁盘使用错误以允许我使用聚合框架?

聚合的第二个参数是会话。您需要将选项作为关键字参数传递并参考文档以使用正确的选项名称:

agg_result= coll.aggregate([...],allowDiskUse=True)

使用allowDiskUse

agg_result= coll.aggregate([..query], allowDiskUse=True)

https://pymongo.readthedocs.io/en/stable/api/pymongo/database.html?highlight=allowDiskUse#pymongo.database.Database.aggregate

All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:

allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.

https://docs.mongodb.com/manual/reference/method/db.collection.aggregate/#db.collection.aggregate

allowDiskUse boolean Optional. Enables writing to temporary files. When set to true, aggregation operations can write data to the _tmp subdirectory in the dbPath directory. See Perform Large Sort Operation with External Sort for an example.

Starting in MongoDB 4.2, the profiler log messages and diagnostic log messages includes a usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions.