DocumentDb GUID 索引精度

Question

假设我们的文档中有一个非唯一的 GUID/UUID 值：

[
  {
    "id": "123456",
    "Key": "117dfd49-a71d-413b-a9b1-841e88db06e8"
    "Name": "Kaapstad",
  },
  ...
]

我们只想通过相等来查询这个。不需要范围或 orderby 查询。例如：

SELECT * FROM c where c.Key = "117dfd49-a71d-413b-a9b1-841e88db06e8"

下面是索引定义。它是一个使用 String 数据类型的哈希索引（因为不会执行范围查询）（因为 Javascript 本身不支持 Guid）

collection.IndexingPolicy.IncludedPaths.Add(
    new IncludedPath { 
        Path = "/Key/?", 
        Indexes = new Collection<Index> { 
            new HashIndex(DataType.String) { Precision = -1 }
        }
    });

但是最好的索引精度是多少？

This MSDN page 我不清楚什么精度值最适合这样的值：

Index precision configuration is more useful with string ranges. Since strings can be any arbitrary length, the choice of the index precision can impact the performance of string range queries, and impact the amount of index storage space required. String range indexes can be configured with 1-100 or -1 ("maximum"). If you would like to perform Order By queries against string properties, then you must specify a precision of -1 for the corresponding paths.

Answer 1

您可以根据希望包含属性键路径（恰好是 Key 属性的文档数量微调索引精度值在你的例子中）。

哈希索引的索引精度表示将属性值哈希到的字节数。因此，降低精度值有助于优化存储索引所需的存储量。提高精度值（在哈希索引的上下文中）有助于防止索引上的哈希冲突。

例如，假设路径 foo.

上的哈希索引精度值为 3

3 字节 = 3 * 8 = 24 位。

24位可以支持：2^24 = 16,777,216个值

根据鸽巢原理，当存储 >16,777,216 个具有 foo 属性的文档时，肯定会发生哈希冲突。一旦发生散列冲突，DocumentDB 将需要对找到的文档子集执行扫描。例如，如果您有 30,000,000 个文档 foo 属性 - 您平均可以扫描 2 个文档。

DocumentDb GUID 索引精度

DocumentDb GUID Index Precision

azure

azure-cosmosdb