Neo4j/Cypher 使用超大子图排序的有效分页

Neo4j/Cypher effective pagination with order by over large sub-graph

我在 (:User) 个节点之间有以下简单关系。

(:User)-[:FOLLOWS {timestamp}]->(:User)

如果我对按 FOLLOWS.timestamp 排序的关注者进行分页,那么当某人拥有数百万关注者时,我会 运行 遇到性能问题。

MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100

在需要排序时对大数据集进行分页的建议方法是什么?

更新

follower             timestamp
---------------------------------------
id(1000000)          1455967905
id(999999)           1455967875
id(999998)           1455967234
id(999997)           1455967123
id(999996)           1455965321
id(999995)           1455964123
id(999994)           1455963645
id(999993)           1455963512
id(999992)           1455961343
....
id(2)                1455909382
id(1)                1455908432

我想使用在 :FOLLOWS 关系上设置的时间戳将此列表切分。如果我想要 return 批次的 4 个关注者,我首先获取当前时间戳,return 4 个最近,然后是 1455967123 和 4 个最近,依此类推。为了做到这一点,整个列表应该按时间戳排序,这会导致数百万条记录出现性能问题。

如果要查找最近的关注者,即时间戳大于给定时间的地方,则只需遍历最近的关注者。

您可以在 20ms

内使用 (2) 完成

如果您真的在寻找最老的(第一个)关注者,则可以跳过并且不要查看每百万关注者中的每一个的时间戳(在我的系统上大约需要 1 秒,请参阅 (3) ).如果你向前跳过,时间会下降到 230ms,请参阅 (1)

一般来说,我们可以看到在我的笔记本电脑上它每秒执行 2M 个数据库操作。

(1) 查看第一个/最老的关注者

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f                  | follower   |
+---------------------------------+
| :FOLLOWS[0]{ts:1}  | Node[1]{}  |
...
+---------------------------------+
10 rows
243 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator        | Estimated Rows | Rows    | DB Hits | Identifiers                                                             | Other                                 |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults |              1 |      10 |       0 | f, follower                                                             | f, follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158]                  |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top            |              1 |      10 |       0 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | Literal(10);                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |     499 | anon[142], anon[155], anon[158], anon[178], f, follower, u              | anon[155]; anon[158]; anon[155].ts    |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |     499 |       0 | anon[142], anon[155], anon[158], f, follower, u                         | f; follower                           |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter         |              1 |     499 |       0 | anon[142], f, follower, u                                               | anon[142]                             |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection     |              1 |    1000 |    1000 | anon[142], f, follower, u                                               | f; follower; f.ts < {  AUTOINT2}      |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip           |              1 |    1000 |       0 | f, follower, u                                                          | {  AUTOINT1}                          |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All)    |              1 | 1000000 | 1000001 | f, follower, u                                                          | (u)<-[  f@12:FOLLOWS]-(  follower@24) |
| |               +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek   |              1 |       1 |       1 | u                                                                       |                                       |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+

Total database accesses: 1001501

(2) 查看最近的关注者

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f                           | follower       |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows  | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |    10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |    10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |    10 |   16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > {  AUTOINT1}) |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 16394 |   16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |     1 |       1 | u              |                                                               |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 32790

(3) 在不跳过的情况下找到最早的关注者

PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f                     | follower    |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms

Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED

+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator        | Estimated Rows | Rows   | DB Hits | Identifiers    | Other                                                         |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults |              1 |     10 |       0 | f, follower    | f, follower                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit          |              1 |     10 |       0 | f, follower, u | Literal(10)                                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter         |              1 |     10 |  999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < {  AUTOINT1}) |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All)    |              1 | 999498 |  999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower)                                   |
| |               +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek   |              1 |      1 |       1 | u              |                                                               |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+

Total database accesses: 1998998