索引 MongoDB 以确保排序一致性
Indexing MongoDB for sort consistency
MongoDB documentation 表示 MongoDB 不会以特定顺序将文档存储在 collection 中。所以如果你有这个 collection:
db.restaurants.insertMany( [
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );
并像这样排序:
db.restaurants.aggregate(
[
{ $sort : { borough : 1 } }
]
)
那么排序顺序可能会不一致,因为:
the borough field contains duplicate values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not to be the same across multiple executions of the same sort.
为了return一个一致的结果,建议将查询修改为:
db.restaurants.aggregate(
[
{ $sort : { borough : 1, _id: 1 } }
]
)
我的问题与此类查询的效率有关。假设您有数百万个文档,您是否应该创建一个复合索引,如 { borough: 1, _id: -1 }
,以提高效率?或者由于 _id
字段的潜在特殊性质,索引 { borough: 1 }
就足够了吗?
我正在使用 MongoDB 4.4.
如果您需要稳定排序,则必须对这两个字段进行排序,而对于高性能查询,您将需要对这两个字段都使用复合索引。
{ borough: 1, _id: -1 }
MongoDB documentation 表示 MongoDB 不会以特定顺序将文档存储在 collection 中。所以如果你有这个 collection:
db.restaurants.insertMany( [
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
] );
并像这样排序:
db.restaurants.aggregate(
[
{ $sort : { borough : 1 } }
]
)
那么排序顺序可能会不一致,因为:
the borough field contains duplicate values for both Manhattan and Brooklyn. Documents are returned in alphabetical order by borough, but the order of those documents with duplicate values for borough might not to be the same across multiple executions of the same sort.
为了return一个一致的结果,建议将查询修改为:
db.restaurants.aggregate(
[
{ $sort : { borough : 1, _id: 1 } }
]
)
我的问题与此类查询的效率有关。假设您有数百万个文档,您是否应该创建一个复合索引,如 { borough: 1, _id: -1 }
,以提高效率?或者由于 _id
字段的潜在特殊性质,索引 { borough: 1 }
就足够了吗?
我正在使用 MongoDB 4.4.
如果您需要稳定排序,则必须对这两个字段进行排序,而对于高性能查询,您将需要对这两个字段都使用复合索引。
{ borough: 1, _id: -1 }