使用 apoc.periodic.commit 或其他方法对大型串行查询进行批处理

Use of apoc.periodic.commit or other approaches for batching large serial query

有人建议我使用 apoc.periodic.commit 来批量处理我在 neo4j 中 运行 的大型查询。我的下面的代码似乎并没有在每一步之后进行批处理和提交。服务器 运行 内存不足,我认为如果在每个项目之后提交,则不应该。

我正在计算一组节点的 jaccard 索引(这里我将 属性 paradig 命名为 "paradigmatic relation" 因为这是文本中的一组下一个词关系).

为每个节点计算这个是一项相当大的工作。我正在计算 53 个节点,但整个人口约为 60k,这是一个 n^2 操作。如果我 运行 它在单个事务中我 运行 内存不足。所以我想分批 运行 它,在计算完每个索引后提交。我已经用 属性 toProcess 标记了我需要处理的节点,并且我正在 运行 下面的代码来计算 jaccard 索引

1)我是不是用错了apoc?

2) 是否有更好、更以 neo4j 为中心的方法来执行此操作。我一直与 SQL.

合作
call apoc.periodic.commit("
MATCH (s:Word{toProcess: True})
MATCH (w:Word)-[:NEXT_WORD]->(s)
WITH collect(DISTINCT w.name) as left1, s
MATCH (w:Word)<-[:NEXT_WORD]-(s)
WITH left1, s, collect(DISTINCT w.name) as right1
// Match every other word
MATCH (o:Word) WHERE NOT s = o
WITH left1, right1, s, o
// Get other right, other left1
MATCH (w:Word)-[:NEXT_WORD]->(o)
WITH collect(DISTINCT w.name) as left1_o, s, o, right1, left1
MATCH (w:Word)<-[:NEXT_WORD]-(o)
WITH left1_o, s, o, right1, left1, collect(DISTINCT w.name) as right1_o
// compute right1 union, intersect
WITH FILTER(x IN right1 WHERE x IN right1_o) as r1_intersect,
  (right1 + right1_o) AS r1_union, s, o, right1, left1, right1_o, left1_o
// compute left1 union, intersect
WITH FILTER(x IN left1 WHERE x IN left1_o) as l1_intersect,
  (left1 + left1_o) AS l1_union, r1_intersect, r1_union, s, o
WITH DISTINCT r1_union as r1_union, l1_union as l1_union, r1_intersect, l1_intersect, s, o
WITH 1.0*size(r1_intersect) / size(r1_union) as r1_jaccard,
  1.0*size(l1_intersect) / size(l1_union) as l1_jaccard,
  s, o
WITH s, o, r1_jaccard, l1_jaccard, r1_jaccard + l1_jaccard as sim
MERGE (s)-[r:RELATED_TO]->(o) SET r.paradig = sim
set s.toProcess = false
",{batchSize:1, parallel:false})

理由:

batchSize:1:我希望它在设置每个 jaccard 索引后提交

parallel:false: 我想要串行操作所以我不会运行内存不足

我已经使用 apoc.periodic.iterate 而不是 apoc.periodic.commit 来完成这项工作,如下所示

我已将此标记为正确答案,因为自提问以来已经过去了相当长的时间。不过,我不认为没有更好的方法。

我发现很难找到在 neo4j 中像这样批量更新的最佳实践,而且我自己也不够专业,不知道这是否是最好的(或者甚至是半途而废的)实践

call apoc.periodic.iterate("

MATCH (s:Word) where s.toProcess=true
return s", 
"MATCH (w:Word)-[:NEXT_WORD]->(s)
WITH collect(DISTINCT w.name) as left1, s
MATCH (w:Word)<-[:NEXT_WORD]-(s)
WITH left1, s, collect(DISTINCT w.name) as right1
// Match every other word
MATCH (o:Word) WHERE NOT s = o
WITH left1, right1, s, o
// Get other right, other left1
MATCH (w:Word)-[:NEXT_WORD]->(o)
WITH collect(DISTINCT w.name) as left1_o, s, o, right1, left1
MATCH (w:Word)<-[:NEXT_WORD]-(o)
WITH left1_o, s, o, right1, left1, collect(DISTINCT w.name) as right1_o
// compute right1 union, intersect
WITH FILTER(x IN right1 WHERE x IN right1_o) as r1_intersect,
  (right1 + right1_o) AS r1_union, s, o, right1, left1, right1_o, left1_o
// compute left1 union, intersect
WITH FILTER(x IN left1 WHERE x IN left1_o) as l1_intersect,
  (left1 + left1_o) AS l1_union, r1_intersect, r1_union, s, o
WITH DISTINCT r1_union as r1_union, l1_union as l1_union, r1_intersect, l1_intersect, s, o
WITH 1.0*size(r1_intersect) / size(r1_union) as r1_jaccard,
  1.0*size(l1_intersect) / size(l1_union) as l1_jaccard,
  s, o
WITH s, o, r1_jaccard, l1_jaccard, r1_jaccard + l1_jaccard as sim
MERGE (s)-[r:RELATED_TO]->(o) SET r.paradig = sim
set s.toProcess = false",
{batchSize:1})
yield batches, total return batches, total