Neo4j/Cypher 使用超大子图排序的有效分页
Neo4j/Cypher effective pagination with order by over large sub-graph
我在 (:User)
个节点之间有以下简单关系。
(:User)-[:FOLLOWS {timestamp}]->(:User)
如果我对按 FOLLOWS.timestamp
排序的关注者进行分页,那么当某人拥有数百万关注者时,我会 运行 遇到性能问题。
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
在需要排序时对大数据集进行分页的建议方法是什么?
更新
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
我想使用在 :FOLLOWS
关系上设置的时间戳将此列表切分。如果我想要 return 批次的 4 个关注者,我首先获取当前时间戳,return 4 个最近,然后是 1455967123 和 4 个最近,依此类推。为了做到这一点,整个列表应该按时间戳排序,这会导致数百万条记录出现性能问题。
如果要查找最近的关注者,即时间戳大于给定时间的地方,则只需遍历最近的关注者。
您可以在 20ms
内使用 (2) 完成
如果您真的在寻找最老的(第一个)关注者,则可以跳过并且不要查看每百万关注者中的每一个的时间戳(在我的系统上大约需要 1 秒,请参阅 (3) ).如果你向前跳过,时间会下降到 230ms,请参阅 (1)
一般来说,我们可以看到在我的笔记本电脑上它每秒执行 2M 个数据库操作。
(1) 查看第一个/最老的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f | follower |
+---------------------------------+
| :FOLLOWS[0]{ts:1} | Node[1]{} |
...
+---------------------------------+
10 rows
243 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, u | Literal(10); |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 499 | anon[142], anon[155], anon[158], anon[178], f, follower, u | anon[155]; anon[158]; anon[155].ts |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 0 | anon[142], anon[155], anon[158], f, follower, u | f; follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter | 1 | 499 | 0 | anon[142], f, follower, u | anon[142] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 1000 | 1000 | anon[142], f, follower, u | f; follower; f.ts < { AUTOINT2} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip | 1 | 1000 | 0 | f, follower, u | { AUTOINT1} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All) | 1 | 1000000 | 1000001 | f, follower, u | (u)<-[ f@12:FOLLOWS]-( follower@24) |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
Total database accesses: 1001501
(2) 查看最近的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f | follower |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > { AUTOINT1}) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 16394 | 16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 32790
(3) 在不跳过的情况下找到最早的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f | follower |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < { AUTOINT1}) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 999498 | 999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 1998998
我在 (:User)
个节点之间有以下简单关系。
(:User)-[:FOLLOWS {timestamp}]->(:User)
如果我对按 FOLLOWS.timestamp
排序的关注者进行分页,那么当某人拥有数百万关注者时,我会 运行 遇到性能问题。
MATCH (u:User {Id:{id}})<-[f:FOLLOWS]-(follower)
WHERE f.timestamp <= {timestamp}
RETURN follower
ORDER BY f.timestamp DESC
LIMIT 100
在需要排序时对大数据集进行分页的建议方法是什么?
更新
follower timestamp
---------------------------------------
id(1000000) 1455967905
id(999999) 1455967875
id(999998) 1455967234
id(999997) 1455967123
id(999996) 1455965321
id(999995) 1455964123
id(999994) 1455963645
id(999993) 1455963512
id(999992) 1455961343
....
id(2) 1455909382
id(1) 1455908432
我想使用在 :FOLLOWS
关系上设置的时间戳将此列表切分。如果我想要 return 批次的 4 个关注者,我首先获取当前时间戳,return 4 个最近,然后是 1455967123 和 4 个最近,依此类推。为了做到这一点,整个列表应该按时间戳排序,这会导致数百万条记录出现性能问题。
如果要查找最近的关注者,即时间戳大于给定时间的地方,则只需遍历最近的关注者。
您可以在 20ms
内使用 (2) 完成如果您真的在寻找最老的(第一个)关注者,则可以跳过并且不要查看每百万关注者中的每一个的时间戳(在我的系统上大约需要 1 秒,请参阅 (3) ).如果你向前跳过,时间会下降到 230ms,请参阅 (1)
一般来说,我们可以看到在我的笔记本电脑上它每秒执行 2M 个数据库操作。
(1) 查看第一个/最老的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> // skip ahead
> WITH f,follower SKIP 999000
> // do the actual check
> WITH f,follower WHERE f.ts < 500
> RETURN f, follower
> ORDER BY f.ts
> LIMIT 10;
+---------------------------------+
| f | follower |
+---------------------------------+
| :FOLLOWS[0]{ts:1} | Node[1]{} |
...
+---------------------------------+
10 rows
243 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, f, follower, u | anon[155]; anon[158] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Top | 1 | 10 | 0 | anon[142], anon[155], anon[158], anon[178], f, follower, u | Literal(10); |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 499 | anon[142], anon[155], anon[158], anon[178], f, follower, u | anon[155]; anon[158]; anon[155].ts |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 499 | 0 | anon[142], anon[155], anon[158], f, follower, u | f; follower |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Filter | 1 | 499 | 0 | anon[142], f, follower, u | anon[142] |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Projection | 1 | 1000 | 1000 | anon[142], f, follower, u | f; follower; f.ts < { AUTOINT2} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Skip | 1 | 1000 | 0 | f, follower, u | { AUTOINT1} |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +Expand(All) | 1 | 1000000 | 1000001 | f, follower, u | (u)<-[ f@12:FOLLOWS]-( follower@24) |
| | +----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+---------+---------+-------------------------------------------------------------------------+---------------------------------------+
Total database accesses: 1001501
(2) 查看最近的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts > 999500
> RETURN f, follower
> LIMIT 10;
+----------------------------------------------+
| f | follower |
+----------------------------------------------+
| :FOLLOWS[999839]{ts:999840} | Node[999840]{} |
...
+----------------------------------------------+
10 rows
23 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 16394 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts > { AUTOINT1}) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 16394 | 16395 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+-------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+-------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 32790
(3) 在不跳过的情况下找到最早的关注者
PROFILE
> MATCH (u)<-[f:FOLLOWS]-(follower) WHERE id(u) = 0
> AND f.ts < 500
> RETURN f, follower
> LIMIT 10;
+-------------------------------------+
| f | follower |
+-------------------------------------+
...
| :FOLLOWS[491]{ts:492} | Node[492]{} |
+-------------------------------------+
10 rows
1008 ms
Compiler CYPHER 2.3 Planner COST Runtime INTERPRETED
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Identifiers | Other |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
| +ProduceResults | 1 | 10 | 0 | f, follower | f, follower |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Limit | 1 | 10 | 0 | f, follower, u | Literal(10) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Filter | 1 | 10 | 999498 | f, follower, u | AndedPropertyComparablePredicates(f,f.ts,f.ts < { AUTOINT1}) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +Expand(All) | 1 | 999498 | 999499 | f, follower, u | (u)<-[f:FOLLOWS]-(follower) |
| | +----------------+--------+---------+----------------+---------------------------------------------------------------+
| +NodeByIdSeek | 1 | 1 | 1 | u | |
+-----------------+----------------+--------+---------+----------------+---------------------------------------------------------------+
Total database accesses: 1998998