如何使用 mongodb 对嵌套列表进行查询
How to make queries on nested lists with mongodb
上下文
我有一个遗留的 mongodb 有 4 个 collection(见下面的模型)。我的目标是查询所有 observation
具有 至少一个 字段 is_valid
设置为 false 且 observation.results.stitched.model_outputs.results.predictions.metadata.created_at
介于 2日期。 “至少一个”,我的意思是如果一个观察有一个符合这两个要求的模型输出预测,它应该 return 观察及其所有模型这些模型输出的输出和预测。
型号
观察collection:
_id: ObjectId("abcd-1234"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
results: {
stitched: {
full_image: "https://www.path_to_image.com",
model_outputs: [
ObjectId("abcs-1243"),
ObjectId("abce-1247")
]
}
}
}
model_output collection:
_id: ObjectId("abcs-1243"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
results: {
raw_predictions: "https://www.path_to_large_array.com",
binary_mark: "https://www.path_to_binary_mask.com",
predictions: [
ObjectId("wbcs-124e"),
ObjectId("awds-234e"),
ObjectId("jnla-1233"),
...,
ObjectId("jawd-1039")
]
}
}
预测collection:
_id: ObjectId("wbcs-124e"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
area: 21484060.5,
perimeter: 1640.724417686462,
bounding_box: [
39,
281,
630,
602
],
max_width: 5751,
max_height: 3871,
metadata: [
ObjectId("mwao-1243"),
ObjectId("lksk-8293"),
ObjectId("psdk-1293")
]
}
元数据collection(就像模型预测上所有人类注释的历史):
_id: ObjectId("mwao-1243"),
created_at:2022-04-29T14:10:29.122+00:00,
status: "persisted",
type: "human label",
is_valid: false,
comment: "this prediction is a mistake"
}
当前解决方案
我目前的解决方案是在元数据 collection 上设置一个 observation_id
字段来跟踪 parent 观察。然后我可以查询元数据 collection,然后获取 observation_id
的列表,然后使用聚合管道查询嵌套的 objects。我更喜欢单个怪物查询而不是 2 个查询,因为添加 observation_id
是多余的。
如果我没理解错的话,你需要三个$lookup
,然后是$filter
和$match
。这将带来所有数据并创建一个字段 matchingDocs
来计算满足每个观察文档需求的元数据文档的数量。然后只匹配具有超过 0 个 matchingDocs 的观察文档。例如:
db.observation.aggregate([
{
$lookup: {
from: "model_output",
localField: "results.stitched.model_outputs",
foreignField: "_id",
as: "model_outputs"
}
},
{
$lookup: {
from: "prediction",
localField: "model_outputs.results.predictions",
foreignField: "_id",
as: "predictions"
}
},
{
$lookup: {
from: "metadata",
localField: "predictions.metadata",
foreignField: "_id",
as: "metadata"
}
},
{
"$addFields": {
matchingDocs: {
$size: {
$filter: {
input: "$metadata",
as: "item",
cond: {
$and: [
{
$eq: [
"$$item.is_valid",
false
]
},
{
$gte: [
"$$item.created_at",
ISODate("2022-04-28T14:10:29.122+00:00")
]
},
{
$lte: [
"$$item.created_at",
ISODate("2022-04-30T14:10:29.122+00:00")
]
}
]
}
}
}
}
}
},
{$match: {matchingDocs: {$gt: 0}}},
{$unset: "matchingDocs"}
])
上下文
我有一个遗留的 mongodb 有 4 个 collection(见下面的模型)。我的目标是查询所有 observation
具有 至少一个 字段 is_valid
设置为 false 且 observation.results.stitched.model_outputs.results.predictions.metadata.created_at
介于 2日期。 “至少一个”,我的意思是如果一个观察有一个符合这两个要求的模型输出预测,它应该 return 观察及其所有模型这些模型输出的输出和预测。
型号
观察collection:
_id: ObjectId("abcd-1234"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
results: {
stitched: {
full_image: "https://www.path_to_image.com",
model_outputs: [
ObjectId("abcs-1243"),
ObjectId("abce-1247")
]
}
}
}
model_output collection:
_id: ObjectId("abcs-1243"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
results: {
raw_predictions: "https://www.path_to_large_array.com",
binary_mark: "https://www.path_to_binary_mask.com",
predictions: [
ObjectId("wbcs-124e"),
ObjectId("awds-234e"),
ObjectId("jnla-1233"),
...,
ObjectId("jawd-1039")
]
}
}
预测collection:
_id: ObjectId("wbcs-124e"),
created_at:2022-04-28T11:14:20.002+00:00,
status: "persisted",
area: 21484060.5,
perimeter: 1640.724417686462,
bounding_box: [
39,
281,
630,
602
],
max_width: 5751,
max_height: 3871,
metadata: [
ObjectId("mwao-1243"),
ObjectId("lksk-8293"),
ObjectId("psdk-1293")
]
}
元数据collection(就像模型预测上所有人类注释的历史):
_id: ObjectId("mwao-1243"),
created_at:2022-04-29T14:10:29.122+00:00,
status: "persisted",
type: "human label",
is_valid: false,
comment: "this prediction is a mistake"
}
当前解决方案
我目前的解决方案是在元数据 collection 上设置一个 observation_id
字段来跟踪 parent 观察。然后我可以查询元数据 collection,然后获取 observation_id
的列表,然后使用聚合管道查询嵌套的 objects。我更喜欢单个怪物查询而不是 2 个查询,因为添加 observation_id
是多余的。
如果我没理解错的话,你需要三个$lookup
,然后是$filter
和$match
。这将带来所有数据并创建一个字段 matchingDocs
来计算满足每个观察文档需求的元数据文档的数量。然后只匹配具有超过 0 个 matchingDocs 的观察文档。例如:
db.observation.aggregate([
{
$lookup: {
from: "model_output",
localField: "results.stitched.model_outputs",
foreignField: "_id",
as: "model_outputs"
}
},
{
$lookup: {
from: "prediction",
localField: "model_outputs.results.predictions",
foreignField: "_id",
as: "predictions"
}
},
{
$lookup: {
from: "metadata",
localField: "predictions.metadata",
foreignField: "_id",
as: "metadata"
}
},
{
"$addFields": {
matchingDocs: {
$size: {
$filter: {
input: "$metadata",
as: "item",
cond: {
$and: [
{
$eq: [
"$$item.is_valid",
false
]
},
{
$gte: [
"$$item.created_at",
ISODate("2022-04-28T14:10:29.122+00:00")
]
},
{
$lte: [
"$$item.created_at",
ISODate("2022-04-30T14:10:29.122+00:00")
]
}
]
}
}
}
}
}
},
{$match: {matchingDocs: {$gt: 0}}},
{$unset: "matchingDocs"}
])