Neo4j:标签与索引 属性?

Neo4j: label vs. indexed property?

假设您是 Twitter,并且:

您可以为这些推文添加 标签,例如:AwaitingModeration,或添加索引 a 属性,例如isAwaitingModeration = true|false.

一种选择是否天生就比另一种更好?

我知道最好的答案可能是尝试对两者进行负载测试 :),但是 Neo4j 的实施 POV 是否有任何东西可以使一个选项更健壮或更适合这种查询?

它是否取决于在任何给定时刻处于此状态的推文数量?如果它在 10 年代与 1000 年代,那有什么不同吗?

我的印象是标签更适合大量的节点,而索引属性更适合较小的数量(理想情况下,唯一节点),但我不确定这是否真的如此。

谢谢!

更新:跟进 blog post 已发布。

这是我们为客户建模数据集时的常见问题,也是 Active/NonActive 实体的典型用例。

这是我对 Neo4j2.1.6 有效的体验的一点反馈:

点 1. 在标签或索引 属性 和 return 节点匹配之间,数据库访问不会有差异

点2.当这样的节点在模式的末尾时会遇到差异,例如

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                                      Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                                      posts |
|      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 |
|     EagerAggregation |    1 |      0 |                      |                                                                          n |
|               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == {  AUTOBOOL1}) |
| SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            |
|          SchemaIndex |    1 |      2 |                 n, n |                                                {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+

Total database accesses: 17

在这种情况下,Cypher 将不会使用索引 :Post(published)

因此,如果您有一个 ActivePost 标签,例如,标签的使用性能更高。 :

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 |
|     EagerAggregation |    1 |      0 |                      |                                n |
|               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 7

第 3 点。 始终使用标签表示正例,这意味着对于上述情况,具有 Draft 标签将强制您执行以下查询:

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;

意味着 Cypher 将打开每个节点标签 headers 并对其进行过滤。

要点 4. 避免需要匹配多个标签

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                         Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                         posts |
|      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 |
|     EagerAggregation |    1 |      0 |                      |                                                             n |
|               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               |
|          SchemaIndex |    1 |      2 |                 n, n |                                   {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+

Total database accesses: 12

这将对 Cypher 产生与第 3 点相同的过程。

第 5 点。 如果可能,通过正确键入命名关系来避免在标签上进行匹配

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts

-

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +SimplePatternMatcher
          |
          +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 |
|     EagerAggregation |    1 |      0 |                      |                                n |
| SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 2

性能会更高,因为它将使用图表的所有功能,并且只遵循来自节点的关系,导致数据库访问不多于匹配用户节点,因此不过滤标签。

这是我的 0,02€