MongoDB 聚合查询优化:匹配 -> 展开 -> 匹配与展开 -> 匹配
MongoDB Aggregation Query Optimization : match -> unwind -> match vs unwind->match
输入数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList" : [
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : false
}
],
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
输出数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList":
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
查询 1(unwind
然后 match
)
aggregate([
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
查询 2(match
然后 unwind
然后 match
)
aggregate([
{
$match: { 'idList.isDispacthed': isDispatched }
},
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
我的问题/我的顾虑
(假设我在此集合中有大量文档(50k +)并假设我在同一管道中进行此查询后还有其他查找和预测)
match -> unwind -> match
VS unwind ->match
- 这两个查询之间有性能差异吗?
- 还有其他(更好的)方法来编写此查询吗?
这完全取决于 MongoDB 查询规划器优化器:
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.
To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
为 poDetailsId
和 运行 这个查询创建索引:
db.getCollection('collection').explain().aggregate([
{
$unwind: "$idList"
},
{
$match: {
'idList.isDispacthed': true,
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f")
}
}
])
{
"stages" : [
{
"$cursor" : {
"query" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryHash" : "2CF7E390",
"planCacheKey" : "A8739F51",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"poDetailsId" : 1.0
},
"indexName" : "poDetailsId_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"poDetailsId" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"poDetailsId" : [
"[ObjectId('5dc7ac15720a2772c7b7666f'), ObjectId('5dc7ac15720a2772c7b7666f')]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$unwind" : {
"path" : "$idList"
}
},
{
"$match" : {
"idList.isDispacthed" : {
"$eq" : true
}
}
}
],
"ok" : 1.0
}
如您所见,MongoDB 会将此聚合更改为:
db.getCollection('collection').aggregate([
{
$match: { "poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f") }
}
{
$unwind: "$idList"
},
{
$match: { 'idList.isDispacthed': true }
}
])
从逻辑上讲,$match -> $unwind -> $match
更好,因为您过滤(按索引)记录的子集而不是完全扫描(处理 100 个匹配的文档≠所有文档)。
If your aggregation operation requires only a subset of the data in a collection, use the $match
, $limit
, and $skip
stages to restrict the documents that enter at the beginning of the pipeline. When placed at the beginning of a pipeline, $match
operations use suitable indexes to scan only the matching documents in a collection.
https://docs.mongodb.com/manual/core/aggregation-pipeline/#early-filtering
操作文档后,MongoDB 无法应用索引。
输入数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList" : [
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : false
}
],
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
输出数据
{
"_id" : ObjectId("5dc7ac6e720a2772c7b76671"),
"idList":
{
"queueUpdateTimeStamp" : "2019-12-12T07:16:47.577Z",
"displayId" : "H14",
"currentQueue" : "10",
"isRejected" : true,
"isDispacthed" : true
},
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f"),
"processtype" : 1
}
查询 1(unwind
然后 match
)
aggregate([
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
查询 2(match
然后 unwind
然后 match
)
aggregate([
{
$match: { 'idList.isDispacthed': isDispatched }
},
{
$unwind: { path: "$idList" }
},
{
$match: { 'idList.isDispacthed': isDispatched }
}
])
我的问题/我的顾虑
(假设我在此集合中有大量文档(50k +)并假设我在同一管道中进行此查询后还有其他查找和预测)
match -> unwind -> match
VS unwind ->match
- 这两个查询之间有性能差异吗?
- 还有其他(更好的)方法来编写此查询吗?
这完全取决于 MongoDB 查询规划器优化器:
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.
To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
为 poDetailsId
和 运行 这个查询创建索引:
db.getCollection('collection').explain().aggregate([
{
$unwind: "$idList"
},
{
$match: {
'idList.isDispacthed': true,
"poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f")
}
}
])
{
"stages" : [
{
"$cursor" : {
"query" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"poDetailsId" : {
"$eq" : ObjectId("5dc7ac15720a2772c7b7666f")
}
},
"queryHash" : "2CF7E390",
"planCacheKey" : "A8739F51",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"poDetailsId" : 1.0
},
"indexName" : "poDetailsId_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"poDetailsId" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"poDetailsId" : [
"[ObjectId('5dc7ac15720a2772c7b7666f'), ObjectId('5dc7ac15720a2772c7b7666f')]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$unwind" : {
"path" : "$idList"
}
},
{
"$match" : {
"idList.isDispacthed" : {
"$eq" : true
}
}
}
],
"ok" : 1.0
}
如您所见,MongoDB 会将此聚合更改为:
db.getCollection('collection').aggregate([
{
$match: { "poDetailsId" : ObjectId("5dc7ac15720a2772c7b7666f") }
}
{
$unwind: "$idList"
},
{
$match: { 'idList.isDispacthed': true }
}
])
从逻辑上讲,$match -> $unwind -> $match
更好,因为您过滤(按索引)记录的子集而不是完全扫描(处理 100 个匹配的文档≠所有文档)。
If your aggregation operation requires only a subset of the data in a collection, use the
$match
,$limit
, and$skip
stages to restrict the documents that enter at the beginning of the pipeline. When placed at the beginning of a pipeline,$match
operations use suitable indexes to scan only the matching documents in a collection.
https://docs.mongodb.com/manual/core/aggregation-pipeline/#early-filtering