为什么在使用默认计划程序时，参数化 Cypher 查询比未参数化查询花费更长的时间

Question

我正在使用 neo4j-2.2.1 并使用 T运行sactional Cypher Restful 端点进行查询。我正在尝试使用以下查询匹配电子邮件节点

match (e:Email) where e.email in ['gxxxxxxs@yyy.com'] return count(e);

电子邮件节点中的电子邮件属性对其具有唯一约束，因此围绕它自动构建了一个索引。当我运行使用查询参数时，上面的查询花费了 23 秒，如 {"statements": [{"parameters": {"emails": ["gxxxxxxs@yyy.com"]}, "statement": "match (c:Email) where c.email in {emails} return count(c)"}]} 中，而当我运行它直接没有任何参数时，它只花了 0.0134 秒，如 {"statements": [{"statement": "match (c:Email) where c.email in ['gxxxxxxs@yyy.com'] return count(c)"}]}

我试图分析它以查看查询是如何执行的，令我惊讶的是，参数化查询没有使用我期望它使用的唯一索引搜索。

分析结果如下 -

非参数化查询的概要文件

{"statements": [{"statement": "PROFILE match (c:Email) where c.email in ['gxxxxxxs@yyy.com'] return count(c)"}]}

{"results":[{"columns":["count(c)"],"data":[{"row":[1]}],"plan":{"root":{"operatorType":"EagerAggregation","DbHits":0,"Rows":1,"版本":"CYPHER 2.2","KeyNames":"","EstimatedRows":1.0000000000279299,"规划师":"COST","identifiers":["count(c)"] ,"children":[{"operatorType":"NodeUniqueIndexSeek","Index":":Email(email)","Rows":1,"DbHits":1,"EstimatedRows":1.00000000005586,"identifiers":["c"],"children":[]}]}}}],"errors":[]}

在 0.0134048461914 秒内完成

参数化查询的配置文件

{"statements": [{"parameters": {"emails": ["gxxxxxx@yyy.com"]}, "statement": "PROFILE match (c:Email) where c.email in {emails} return count(c)"}]} {"results":[{"columns":["count(c)"],"data":[{"row":[1]}],"plan":{"root":{"operatorType":"EagerAggregation","DbHits":0,"Rows":1,"version":"CYPHER 2.2","KeyNames":"","EstimatedRows":1384.8193022918188,"planner":"COST","identifiers":["count(c)"],"children":[{"operatorType":"Filter","LegacyExpression":"any(--INNER-- in {emails} where c.email == --INNER--)","Rows":1,"DbHits":5114522,"EstimatedRows":1917724.4999999998,"identifiers":["c"],"children":[{"operatorType":"NodeByLabelScan","LabelName":":Email","Rows":2557261,"DbHits":2557262,"EstimatedRows":2556966.0,"identifiers":["c"],"children":[]}]}]}}}],"errors":[]}

在 23.5868499279 秒内完成

有人可以帮我理解为什么参数化密码查询不使用唯一索引搜索

Answer 1

对于您的参数化查询，Cypher 引擎似乎没有推断要使用的正确模式索引，因此正在执行扫描。通常，密码引擎需要通过查看 MATCH 子句和 WHERE 条件并使用该信息找到有用的索引来推断在图中的何处开始查询。

您可以要求它使用 USING 的特定索引。所以如果你的 Cypher 查询是这样的

MATCH (c:Email) 
USING INDEX c:Email(email)
WHERE c.email in ['gxxxxxxs@yyy.com'] 
RETURN count(c)

您应该会发现查询是使用 NodeUniqueIndexSeek 执行的。等价于

{
    "statements": [{
        "parameters": {
            "emails": ["gxxxxxxs@yyy.com"]
        }, 
        "statement": "MATCH (c:Email) 
                      USING INDEX c:Email(email) 
                      WHERE c.email IN {emails} 
                      RETURN count(c)"
    }]
}

（为了便于阅读添加了换行符）

Answer 2

这是 Cypher 自 2.2+ 以来报告的错误。（不使用带参数的索引）

https://github.com/neo4j/neo4j/issues/4357

您可以通过在查询前加上 PLANNER RULE 前缀来避免这种情况，以便使用以前的 Cypher 规划器并获得性能，直到错误被修复。

PLANNER RULE MATCH (e:Email) where e.email IN {emails} RETURN count(e);

为什么在使用默认计划程序时，参数化 Cypher 查询比未参数化查询花费更长的时间

Why is parameterized Cypher query is taking longer time compared to unparameterized query when using a default planner

performance

neo4j

cypher

非参数化查询的概要文件

参数化查询的配置文件