如何使用 mongodb 对嵌套列表进行查询

How to make queries on nested lists with mongodb

上下文

我有一个遗留的 mongodb 有 4 个 collection(见下面的模型)。我的目标是查询所有 observation 具有 至少一个 字段 is_valid 设置为 false 且 observation.results.stitched.model_outputs.results.predictions.metadata.created_at 介于 2日期。 “至少一个”,我的意思是如果一个观察有一个符合这两个要求的模型输出预测,它应该 return 观察及其所有模型这些模型输出的输出和预测。

型号

观察collection:

  _id: ObjectId("abcd-1234"),
  created_at:2022-04-28T11:14:20.002+00:00,
  status: "persisted",
  results: {
    stitched: {
      full_image: "https://www.path_to_image.com",
      model_outputs: [
        ObjectId("abcs-1243"),
        ObjectId("abce-1247")
      ]
    }
  } 
}

model_output collection:

  _id: ObjectId("abcs-1243"),
  created_at:2022-04-28T11:14:20.002+00:00,
  status: "persisted",
  results: {
    raw_predictions: "https://www.path_to_large_array.com",
    binary_mark: "https://www.path_to_binary_mask.com",
    predictions: [
      ObjectId("wbcs-124e"),
      ObjectId("awds-234e"),
      ObjectId("jnla-1233"),
      ...,
      ObjectId("jawd-1039")
    ]
  } 
}

预测collection:

  _id: ObjectId("wbcs-124e"),
  created_at:2022-04-28T11:14:20.002+00:00,
  status: "persisted",
  area: 21484060.5,
  perimeter: 1640.724417686462,
  bounding_box: [
    39,
    281,
    630,
    602
  ],
  max_width: 5751,
  max_height: 3871,
  metadata: [
    ObjectId("mwao-1243"),
    ObjectId("lksk-8293"),
    ObjectId("psdk-1293")
  ]
}

元数据collection(就像模型预测上所有人类注释的历史):

  _id: ObjectId("mwao-1243"),
  created_at:2022-04-29T14:10:29.122+00:00,
  status: "persisted",
  type: "human label",
  is_valid: false,
  comment: "this prediction is a mistake"
}

当前解决方案

我目前的解决方案是在元数据 collection 上设置一个 observation_id 字段来跟踪 parent 观察。然后我可以查询元数据 collection,然后获取 observation_id 的列表,然后使用聚合管道查询嵌套的 objects。我更喜欢单个怪物查询而不是 2 个查询,因为添加 observation_id 是多余的。

如果我没理解错的话,你需要三个$lookup,然后是$filter$match。这将带来所有数据并创建一个字段 matchingDocs 来计算满足每个观察文档需求的元数据文档的数量。然后只匹配具有超过 0 个 matchingDocs 的观察文档。例如:

db.observation.aggregate([
  {
    $lookup: {
      from: "model_output",
      localField: "results.stitched.model_outputs",
      foreignField: "_id",
      as: "model_outputs"
    }
  },
  {
    $lookup: {
      from: "prediction",
      localField: "model_outputs.results.predictions",
      foreignField: "_id",
      as: "predictions"
    }
  },
  {
    $lookup: {
      from: "metadata",
      localField: "predictions.metadata",
      foreignField: "_id",
      as: "metadata"
    }
  },
  {
    "$addFields": {
      matchingDocs: {
        $size: {
          $filter: {
            input: "$metadata",
            as: "item",
            cond: {
              $and: [
                {
                  $eq: [
                    "$$item.is_valid",
                    false
                  ]
                },
                {
                  $gte: [
                    "$$item.created_at",
                    ISODate("2022-04-28T14:10:29.122+00:00")
                  ]
                },
                {
                  $lte: [
                    "$$item.created_at",
                    ISODate("2022-04-30T14:10:29.122+00:00")
                  ]
                }
              ]
            }
          }
        }
      }
    }
  },
  {$match: {matchingDocs: {$gt: 0}}},
  {$unset: "matchingDocs"}
])

playground example