Neo4j SDN 4 GraphId 性能与索引

Neo4j SDN 4 GraphId performance vs Index

在我的 Neo4j/SDN 4 应用程序中,我所有的 Cypher 查询都基于内部 Neo4j ID。

这是一个问题,因为我不能在我的 Web 应用程序 URL 中依赖这些 ID。 Neo4j 可以重用这些 ID,因此很有可能在未来的某个时间在相同的 ID 下我们可以找到绝对另一个节点。

我尝试根据以下解决方案重新实现此逻辑:Using the graph to control unique id generation 但发现查询性能下降。

从理论的角度来看,应该基于 属性 和 @Index(unique = true, primary = true)

的 Cypher 查询

例如:

@Index(unique = true, primary = true)
private Long uid;

entity.uid = {someId}

与基于内部 Neo4j ID 的 Cypher 查询具有相同的性能:

id(entity) = {someId} 

已更新

这是:schema输出:

Indexes
   ON :BaseEntity(uid) ONLINE
   ON :Characteristic(lowerName) ONLINE
   ON :CharacteristicGroup(lowerName) ONLINE
   ON :Criterion(lowerName) ONLINE
   ON :CriterionGroup(lowerName) ONLINE
   ON :Decision(lowerName) ONLINE
   ON :FlagType(name) ONLINE (for uniqueness constraint)
   ON :HAS_VALUE_ON(value) ONLINE
   ON :HistoryValue(originalValue) ONLINE
   ON :Permission(code) ONLINE (for uniqueness constraint)
   ON :Role(name) ONLINE (for uniqueness constraint)
   ON :User(email) ONLINE (for uniqueness constraint)
   ON :User(username) ONLINE (for uniqueness constraint)
   ON :Value(value) ONLINE

Constraints
   ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
   ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
   ON ( role:Role ) ASSERT role.name IS UNIQUE
   ON ( user:User ) ASSERT user.email IS UNIQUE
   ON ( user:User ) ASSERT user.username IS UNIQUE

如你所见,我在 :BaseEntity(uid)

上有一个索引

BaseEntity 是我的实体层次结构中的基础 class,例如:

@NodeEntity
public abstract class BaseEntity {

    @GraphId
    private Long id;

    @Index(unique = false)
    private Long uid;

    private Date createDate;

    private Date updateDate;

...

}

@NodeEntity
public class Commentable extends BaseEntity {
...
}

@NodeEntity
public class Decision extends Commentable {

    private String name;

}

当我为 (d:Decision) WHERE d.uid = {uid} 寻找示例时,是否会使用此 uid 索引?

PROFILE 结果 - 内部 ID 与索引 属性

根据内部 ID 查询

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

配置文件输出:

Cypher 版本:CYPHER 3.1,规划器:COST,运行时:INTERPRETED。 238 毫秒内总计 350554 次数据库命中。

基于索引的查询 属性 uid

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Cypher 版本:CYPHER 3.1,规划器:COST,运行时:INTERPRETED。 426 毫秒内总计 671326 次数据库命中。

是否有机会根据 uid 提高性能?

你不在 web url 中使用 Neo4j 内部 id 是对的,因为它们可以在删除节点等后重新使用。

从性能的角度来看,内部 ID 尽可能快 - 它实际上是文件中具有 node/relationship 记录的偏移量(您可能已经注意到这是 2 个单独的 ID 序列,您可以id=z 的节点和具有相同 id=x 的关系)。

任何索引的使用都必须更慢,因为数据库首先进行索引查找,获取内部id,然后读取节点记录。

然而,对于绝大多数应用程序,性能差异可以忽略不计 - 可能比网络延迟或一般 OGM 开销小得多。

如果您看到明显的差异

  • 验证数据库中是否存在索引(例如 :schema 在 Neo4j 浏览器中)
  • 打开日志记录并验证您的查询具有正确的标签(为 org.neo4j.ogm 设置 info 级别)
  • 如果索引存在并且查询包含正确的标签然后使用PROFILE检查查询计划

已更新

是的,索引将用于以下查询:

MATCH (d:Decision) WHERE d.uid = {uid} ...

应该由

生成
session.load(Decision.class, uid)

如果您的索引是主索引或 findByUid DecisionRepository

请注意,当 where 子句出现在查询中间时,索引可能不会被使用:

...
WITH x
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...

这取决于查询计划,您应该使用 PROFILE 来调查它。