Neo4j:标签与索引 属性?
Neo4j: label vs. indexed property?
假设您是 Twitter,并且:
- 您有
(:User)
和 (:Tweet)
个节点;
- 推文可能会被标记;和
- 您想查询列表 当前正在等待审核的已标记推文。
您可以为这些推文添加 标签,例如:AwaitingModeration
,或添加索引 a 属性,例如isAwaitingModeration = true|false
.
一种选择是否天生就比另一种更好?
我知道最好的答案可能是尝试对两者进行负载测试 :),但是 Neo4j 的实施 POV 是否有任何东西可以使一个选项更健壮或更适合这种查询?
它是否取决于在任何给定时刻处于此状态的推文数量?如果它在 10 年代与 1000 年代,那有什么不同吗?
我的印象是标签更适合大量的节点,而索引属性更适合较小的数量(理想情况下,唯一节点),但我不确定这是否真的如此。
谢谢!
更新:跟进 blog post 已发布。
这是我们为客户建模数据集时的常见问题,也是 Active/NonActive 实体的典型用例。
这是我对 Neo4j2.1.6 有效的体验的一点反馈:
点 1. 在标签或索引 属性 和 return 节点匹配之间,数据库访问不会有差异
点2.当这样的节点在模式的末尾时会遇到差异,例如
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;
-
PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION153 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 3 | | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == { AUTOBOOL1}) |
| SimplePatternMatcher | 1 | 12 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
Total database accesses: 17
在这种情况下,Cypher 将不会使用索引 :Post(published)
。
因此,如果您有一个 ActivePost 标签,例如,标签的使用性能更高。 :
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION130 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 1 | | hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher | 1 | 4 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 7
第 3 点。 始终使用标签表示正例,这意味着对于上述情况,具有 Draft 标签将强制您执行以下查询:
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;
意味着 Cypher 将打开每个节点标签 headers 并对其进行过滤。
要点 4. 避免需要匹配多个标签
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION139 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 2 | | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher | 1 | 8 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
Total database accesses: 12
这将对 Cypher 产生与第 3 点相同的过程。
第 5 点。 如果可能,通过正确键入命名关系来避免在标签上进行匹配
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts
-
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION119 |
| EagerAggregation | 1 | 0 | | n |
| SimplePatternMatcher | 3 | 0 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 2
性能会更高,因为它将使用图表的所有功能,并且只遵循来自节点的关系,导致数据库访问不多于匹配用户节点,因此不过滤标签。
这是我的 0,02€
假设您是 Twitter,并且:
- 您有
(:User)
和(:Tweet)
个节点; - 推文可能会被标记;和
- 您想查询列表 当前正在等待审核的已标记推文。
您可以为这些推文添加 标签,例如:AwaitingModeration
,或添加索引 a 属性,例如isAwaitingModeration = true|false
.
一种选择是否天生就比另一种更好?
我知道最好的答案可能是尝试对两者进行负载测试 :),但是 Neo4j 的实施 POV 是否有任何东西可以使一个选项更健壮或更适合这种查询?
它是否取决于在任何给定时刻处于此状态的推文数量?如果它在 10 年代与 1000 年代,那有什么不同吗?
我的印象是标签更适合大量的节点,而索引属性更适合较小的数量(理想情况下,唯一节点),但我不确定这是否真的如此。
谢谢!
更新:跟进 blog post 已发布。
这是我们为客户建模数据集时的常见问题,也是 Active/NonActive 实体的典型用例。
这是我对 Neo4j2.1.6 有效的体验的一点反馈:
点 1. 在标签或索引 属性 和 return 节点匹配之间,数据库访问不会有差异
点2.当这样的节点在模式的末尾时会遇到差异,例如
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;
-
PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION153 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 3 | | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == { AUTOBOOL1}) |
| SimplePatternMatcher | 1 | 12 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
Total database accesses: 17
在这种情况下,Cypher 将不会使用索引 :Post(published)
。
因此,如果您有一个 ActivePost 标签,例如,标签的使用性能更高。 :
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION130 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 1 | | hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher | 1 | 4 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 7
第 3 点。 始终使用标签表示正例,这意味着对于上述情况,具有 Draft 标签将强制您执行以下查询:
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;
意味着 Cypher 将打开每个节点标签 headers 并对其进行过滤。
要点 4. 避免需要匹配多个标签
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION139 |
| EagerAggregation | 1 | 0 | | n |
| Filter | 1 | 2 | | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher | 1 | 8 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
Total database accesses: 12
这将对 Cypher 产生与第 3 点相同的过程。
第 5 点。 如果可能,通过正确键入命名关系来避免在标签上进行匹配
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts
-
MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;
neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
ColumnFilter(0)
|
+Extract
|
+ColumnFilter(1)
|
+EagerAggregation
|
+SimplePatternMatcher
|
+SchemaIndex
+----------------------+------+--------+----------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+----------------------+----------------------------------+
| ColumnFilter(0) | 1 | 0 | | keep columns n, posts |
| Extract | 1 | 0 | | posts |
| ColumnFilter(1) | 1 | 0 | | keep columns n, AGGREGATION119 |
| EagerAggregation | 1 | 0 | | n |
| SimplePatternMatcher | 3 | 0 | n, post, UNNAMED84 | |
| SchemaIndex | 1 | 2 | n, n | { AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+
Total database accesses: 2
性能会更高,因为它将使用图表的所有功能,并且只遵循来自节点的关系,导致数据库访问不多于匹配用户节点,因此不过滤标签。
这是我的 0,02€