以顶点为中心的索引的奇怪行为

Question

我无法理解如何在 ArangoDB 中正确使用以顶点为中心的索引。

在我的烹饪应用程序中，我有以下图表模式：(recipe)-[hasConstituent]->(ingredient)

假设我想要所有需要少于 0 克胡萝卜的食谱。结果当然是空的。

FOR recipe, constituent, p IN INBOUND 'ingredients/carrot' hasConstituent
    FILTER constituent.quantity.value < 0
    RETURN recipe._key

胡萝卜关联了 400.000 个食谱，此查询耗时约 3.9 秒。很好。

现在，我在 hasConstituent 集合中的 _to,quantity.value 属性上创建了一个以顶点为中心的索引，估计选择性为 100%。

我希望它按数字顺序对索引进行排序，然后显着提高 FILTER 或 SORT/LIMIT 请求的速度，但现在之前的请求需要大约 7.9 秒...如果我创建索引“稀疏”，与没有索引的时间相同（~3.9s）

我在这里错过了什么？

最难理解的部分是explain结果给出的执行计划与profile结果不同。这是解释，一切都很好，应该立即获取结果：

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  5   TraversalNode        1     - FOR recipe  /* vertex */, constituent  /* edge */ IN 1..1  /* min..maxPathDepth */ INBOUND 'ingredients/carrot' /* startnode */  hasConstituent
  6   CalculationNode      1       - LET #8 = (constituent.`quantity`.`value` < 0)   /* simple expression */
  7   FilterNode           1       - FILTER #8
  8   CalculationNode      1       - LET #10 = recipe.`_key`   /* attribute expression */
  9   ReturnNode           1       - RETURN #10

但是在配置文件中：

Execution plan:
 Id   NodeType          Calls    Items   Runtime [s]   Comment
  1   SingletonNode         1        1       0.00000   * ROOT
  5   TraversalNode       433   432006       7.64893     - FOR recipe  /* vertex */, constituent  /* edge */ IN 1..1  /* min..maxPathDepth */ INBOUND 'ingredients/carrot' /* startnode */  hasConstituent
  6   CalculationNode     433   432006       0.28761       - LET #8 = (constituent.`quantity`.`value` < 0)   /* simple expression */
  7   FilterNode            1        0       0.08704       - FILTER #8
  8   CalculationNode       1        0       0.00000       - LET #10 = recipe.`_key`   /* attribute expression */
  9   ReturnNode            1        0       0.00001       - RETURN #10

我精确地在两个结果中使用了索引：

Indexes used:
 By   Name              Type         Collection       Unique   Sparse   Selectivity   Fields                        Ranges
  5   recipeByIngrQty   persistent   hasConstituent   false    false       100.00 %   [ `_to`, `quantity.value` ]   base INBOUND

非常欢迎任何帮助

Answer 1

对于遍历 FOR vertex, edge, path IN ...，对 vertex 或 edge 的过滤仅适用于结果，而不适用于实际访问的内容。至于为什么这有意义，请记住，通常并非遍历期间访问的所有顶点或边实际上都是结果的一部分：例如，如果 IN min..max 参数中的 min 大于零- 默认情况下是一个 - 距离小于该距离的顶点（及其传入边）不是结果的一部分，但必须被访问。

这就是为什么，如果你想在遍历期间限制访问的边，你必须在path变量上添加过滤器。例如：

FOR recipe, constituent, p IN INBOUND 'ingredients/carrot' hasConstituent
    FILTER p.edges[*].quantity.value ALL < 0
    RETURN recipe._key

这应该如您所料地使用索引。有关详细信息，请参阅 vertex centric indexes and the AQL graph traversal 文档。

我认为这回答了你问题的核心，现在要清理你在途中发现的一些问题。

I expected it to sort indexes in a numeric order, and then to significantly increase the speed of FILTER or SORT/LIMIT requests, but now the previous request takes ~7.9s... If I make the index "sparse", it takes the same time as without index (~3.9s)

这里有两件事。

首先，听起来优化器更喜欢您的索引而不是边缘索引。这可能不应该是这种情况，因为（没有我上面描述的变化）它并不比边缘索引更具体，但速度稍慢。你没有指定你使用的 ArangoDB 版本，所以我不能具体评论。但是，如果您使用的是受支持的次要版本的最新补丁版本，例如在撰写本文时为 3.7.10 或 3.6.12，您可以将其报告为 issue on Github.

其次，稀疏索引不会索引不存在的值或 null 值。因此它不能用于可以报告 null 值的查询。现在请注意 null < 0 是 true，有关详细信息，请参阅文档中的 type and value order。所以你的查询 constituent.quantity.value < 0 可以报告 null 值，这就是稀疏索引被区别对待的原因（即根本不能使用）。

现在到了最后一点：

The most hard part to understand is that the execution plan given by the explain result is different from the profile result.

解释输出显示“Est.”列，这是对该节点将发出/执行的行数/迭代数的估计。相反，配置文件输出中的“项目”列是相应的确切数字。现在这个估计在某些情况下可能很好，但在其他情况下可能很糟糕。这不一定是个问题，不实际执行查询就不能准确。如果它碰巧导致了问题，因为估计让优化器为作业选择了错误的索引，则可以使用 index hints。但这里不是这种情况。

除此之外，您显示的两个计划似乎完全相同。

以顶点为中心的索引的奇怪行为

Strange behaviour with vertex centric indexes

arangodb