使用 $ne 和 $or 使用数组的查询索引

Question

假设我有一个 MongoDB 集合，其中包含如下文档：

{ "_id": ObjectId("the_object_id"),
  "type": "BLOG_POST",
  "state": "IN_PROGRESS",
  "createDate":ISODate("2017-02-15T01:01:01.000Z"),
  "users": {
      "posted": ["user1", "user2", "user3"],
      "favorited": ["user1", "user4", "user5", "user6"],
      "other_fields": "other data",
   },
   "many_more_fields": "a bunch of other data"
}

我有这样的查询：

db.collection.find({"$and":[
    {"type": "BLOG_POST"},
    {"$or": [ {"users.posted":"userX"}, {"users.favorited":"userX"} ] },
    {"state": {"$ne":"COMPLETED"}}
]}).sort({"createDate":1})

该集合目前只有 _id 字段的索引以及一些未包含在该查询或示例中的字段。

就基数而言，文档具有： type=BLOG_POST 大约占集合的 75%，state $ne "COMPLETED" 大约占集合的 50%，用户在 users.posted 或 users.favorited 最多 2集合的百分比。

这个用例的最佳索引或索引集是什么？

据我了解，我们不能在同一个索引中同时索引 users.posted 和 users.favorited，因为它们都是数组。将来我们可能会创建一个包含两个字段的 users.userswhocare 的新数组，但假设我们无法在短期内进行更改。

我还认为$ne on state 意味着一般不会使用状态索引。查询规划器是否能够通过索引末尾的状态字段来处理 $ne 条件？

我有索引 {"type":1, "createDate":1, "state":1} 的想法，这样查询就会命中类型，使用createDate 进行排序，并使用索引的最后一位处理 $ne。它仍然需要提取 35%-40% 的文档来为用户进行测试。不好，但比当前的集合扫描有所改进。

或者我可以创建两个索引，一个像 {"users.posted":1, "type":1, "createDate":1, "state":1} 和 {"users.favorited":1, "type":1, "createDate":1, "state":1}。查询规划器会使用这两个索引的交集来更快地找到感兴趣的文档吗？

我们目前正在使用 MongoDB 3.2.5。如果 MongoDB 3.2 和 3.4 之间的答案有差异，我很想知道它们。

Answer 1

经过一些分析，我发现添加多个查询 users.posted 和 users.favorited 作为各自索引中的第一项都表现更好，并被 MongoDB 查询计划器选择.

我创建了如下索引：

db.collection.createIndex({"users.posted":1, "type":1, "createDate":1, "state":1})
db.collection.createIndex({"users.favorited":1, "type":1, "createDate":1, "state":1})

由于 users.posted 和 users.favorited 的基数较高（其中任何一个都不会超过集合的 2%，大多数情况下小于 0.5%），MongoDB 查询规划器同时使用索引交集。

我针对如下索引进行了测试：

db.collection.createIndex({"type":1, "createDate":1, "state":1}).

使用 explain() 和 explain("executionStats") 查看针对两个查询的解释计划，查询计划器使用索引扫描 {"$or": [ {"users.posted":"userX"}, {"users.favorited":"userX"} ] } 部分查询作为第一阶段。这导致了最少的 totalKeysExamined 和 totalDocsExamined。

使用 $ne 和 $or 使用数组的查询索引

Index for queries with $ne and $or with arrays

mongodb

mongodb-query

mongodb-indexes