MongoDB聚合查询优化：$match、$lookup和double $unwind

Question

假设我们有两个集合：

devices：此集合中的对象具有（除其他外）字段 name（字符串）和 cards（数组）；该数组的每个部分都有字段 model 和 slot。这些卡片不是另一个集合，它只是一些嵌套数据。
interfaces：此集合中的对象具有（除其他外）字段 name 和 owner.

额外信息：

对于 cards，我只对 slot 是数字的那些感兴趣
对于符合先前条件的 device 的 part，在另一个集合中有一个 interface 对象，其中 owner 字段的值为name of the device in cause and name is s[slot]p1 (character 's' + 该部分的槽 + 'p1')

我的工作是创建一个查询以生成所有这些设备中所有现有卡片的摘要，每个条目都使用 interfaces 集合中的信息进行丰富。我还需要能够对查询进行参数化（如果我只对具有特定名称的特定设备感兴趣，只对特定型号的卡等感兴趣）

到目前为止，我有这个：

mongo_client.devices.aggregate([
    # Retrieve all the devices having the cards field
    {
        "$match": {
            # "name": "<device-name>",
            "cards": {
                "$exists": "true"
            }
        }
    },
    
    # Group current content with every cards object
    {
         "$unwind": "$cards"
    },
    
    # Only take the ones having "slot" a number
    {
        "$match": {
            "cards.slot": {
                "$regex": "^\d+$"
            }
        }
    },
    
    # Retrieve the device's interfaces
    {
        "$lookup": {
            "from": "interfaces",
            "let": {
                "owner": "$name",
            },
            "as": "interfaces",
            "pipeline": [{
                "$match": {
                    "$expr": {
                        "$eq": ["$owner", "$$owner"]
                    },
                },
            }]
        }
    },
    
    {
        "$unwind": "$interfaces"
    },
    
    {
        "$match": {
            "$expr": {
                "$eq": ["$interfaces.name", {
                    "$concat": ["s", "$cards.slot", "p1"]
                }]
            }
        }
    },
    
    # Build the final object
    {
        "$project": {
            # Card related fields
            "slot": "$cards.slot",
            "model": "$cards.model",

            
            # Device related fields
            "device_name": "$name",
           
            # Fields from interfaces
           "interface_field_x": "$interfaces.interface_field_x",
           "interface_field_y": "$interfaces.interface_field_y",
        }
    },
])

查询有效，而且速度非常快，但我有一个问题：

有什么办法可以避免第二次$unwind？如果每个 device 有 50-150 个 interface 对象，其中 owner 是该设备的名称，我觉得我正在减慢它的速度。每个设备都有一个名为 s[slot]p1 的唯一接口。我怎样才能以更好的方式获得那个特定的对象？我尝试在$lookup甚至$regex或$regexMatch内部的$match中使用两个$eq表达式，但我无法使用外部[=16] =] 字段，即使我把它放在 let.
里面
如果我想参数化我的查询以在需要时过滤数据，您会添加匹配表达式作为中间步骤还是只在最后过滤？

欢迎对查询进行任何其他改进。我也对如何使其防错感兴趣（如果错误 cards 丢失或 s1p1 接口未找到。

谢谢！

Answer 1

您的问题缺少查询的示例数据，但是：

将第三阶段合并到第一阶段，去掉$exists
使用 localField+foreignField 而不是管道，管道要慢得多

查询中展开的次数应与结果集中所需的对象相对应：

0 台设备展开
1 张纸牌放松
2 接口展开

为了匹配所需的条件，不需要平仓。

MongoDB聚合查询优化：$match、$lookup和double $unwind

MongoDB aggregation query optimization: $match, $lookup and double $unwind

mongodb

pymongo

mongodb-query

aggregation-framework

pymongo-3.x