如何计算正则查询结果的个数？

Question

我有一个很大的 MongoDB 集合（大约 3000 万条记录），每个项目都有一个由 8 位数字组成的唯一数字数组。大多数情况下，该数组只有 1 个元素（1 个数字）。例如，我想知道集合中有多少条以 4 开头的记录，所以我查询：

{ "numbers.number": /^4.*/i }

但是查询时间太长了，上次我用了20分钟才中断执行。所以我想知道是否有一种方法可以优化查询。 numbers.number 已编入索引。我也试过这个：

{ "numbers.number": /^4[0-9]{7}/}

仍然需要很长时间。这是文档的示例：

{ 
    "_id" : ObjectId("some_id"), 
    "created_at" : ISODate("2022-10-13T09:32:45.000+0000"), 
    "source" : {
        "created_at" : ISODate("2021-10-13T08:54:06.000+0000"), 
        "some_id" : NumberInt(234), 
        "another_id" : NumberInt(11)
    }, 
    "first_name" : "Test", 
    "last_name" : "Test", 
    "date_of_birth" : "1970-01-01", 
    "status" : "active", 
    "numbers" : [
        {
            "created_at" : ISODate("2022-11-13T09:32:45.000+0000"), 
            "number" : "40000005", 
            "_id" : ObjectId("some_id")
        }
    ]
}

Answer 1

正则表达式的性能和速度成本很高，即使它有索引也没有索引，如果你有数以百万计的数据，

这是一个类似的问题，MongoDB, performance of query by regular expression on indexed fields

我不确定，我还没有比较和测试性能。但是试试 ^ 没有 .*,

的符号

{ "numbers.number": /^4/ }

根据 MongoDB 的 regex index use documentation 中的附加说明，

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.

第二个选项，如果你知道数字的范围，我建议你可以使用 $gte 和 $lt 运算符通过指定数字来查找特定系列，

{ 
  "numbers.number": {
    "$gte": "40000000",
    "$lt": "50000000"
  }
}

第三，可以使用$or运算符检查多个范围，

{ 
  "$or": [
    {
      "numbers.number": {
        "$gte": "4000000",
        "$lt": "5000000"
      }
    },
    {
      "numbers.number": {
        "$gte": "40000000",
        "$lt": "50000000"
      }
    }
  ]
}

NOTE:

try to execute this query in MongoDB shell

always use count functions, if you just need counts of the documents

db.coll.find({query}).count()

db.coll.countDocuments({query})

如何计算正则查询结果的个数？

How to calculate the count of regex query results?

regex

mongodb

mongodb-query