sub-graph聚合的递归查询(任意深度)
Recursive query with sub-graph aggregation (arbitrary depth)
我问了一个问题 关于沿图聚合数量。提供的两个答案效果很好,但现在我正在尝试将 Cypher 查询扩展到可变深度图。
总而言之,我们从一堆叶子商店开始,这些商店都与特定供应商相关联,这是 Store
节点上的 属性。然后将库存转移到其他商店,每个供应商的比例对应于他们对原始商店的贡献。
因此对于节点 B02
,S2
贡献了 750/1250 = 60%
并且 S3
贡献了 40%
。然后我们将 B02
的 600 个单位移动,其中 60%
属于 S2
,40%
属于 S3
,依此类推。
我们想知道 D01
中最后 700 个单位的百分比属于每个供应商。具有相同名称的供应商是同一供应商。所以对于上图我们期望:
S1, 38.09
S2, 27.61
S3, 34.28
我已经使用这个 Cypher 脚本准备了一个图表:
CREATE (A01:Store {Name: 'A01', Supplier: 'S1'})
CREATE (A02:Store {Name: 'A02', Supplier: 'S1'})
CREATE (A03:Store {Name: 'A03', Supplier: 'S2'})
CREATE (A04:Store {Name: 'A04', Supplier: 'S3'})
CREATE (A05:Store {Name: 'A05', Supplier: 'S1'})
CREATE (A06:Store {Name: 'A06', Supplier: 'S1'})
CREATE (A07:Store {Name: 'A07', Supplier: 'S2'})
CREATE (A08:Store {Name: 'A08', Supplier: 'S3'})
CREATE (B01:Store {Name: 'B01'})
CREATE (B02:Store {Name: 'B02'})
CREATE (B03:Store {Name: 'B03'})
CREATE (B04:Store {Name: 'B04'})
CREATE (C01:Store {Name: 'C01'})
CREATE (C02:Store {Name: 'C02'})
CREATE (D01:Store {Name: 'D01'})
CREATE (A01)-[:MOVE_TO {Quantity: 750}]->(B01)
CREATE (A02)-[:MOVE_TO {Quantity: 500}]->(B01)
CREATE (A03)-[:MOVE_TO {Quantity: 750}]->(B02)
CREATE (A04)-[:MOVE_TO {Quantity: 500}]->(B02)
CREATE (A05)-[:MOVE_TO {Quantity: 100}]->(B03)
CREATE (A06)-[:MOVE_TO {Quantity: 200}]->(B03)
CREATE (A07)-[:MOVE_TO {Quantity: 50}]->(B04)
CREATE (A08)-[:MOVE_TO {Quantity: 450}]->(B04)
CREATE (B01)-[:MOVE_TO {Quantity: 400}]->(C01)
CREATE (B02)-[:MOVE_TO {Quantity: 600}]->(C01)
CREATE (B03)-[:MOVE_TO {Quantity: 100}]->(C02)
CREATE (B04)-[:MOVE_TO {Quantity: 200}]->(C02)
CREATE (C01)-[:MOVE_TO {Quantity: 500}]->(D01)
CREATE (C02)-[:MOVE_TO {Quantity: 200}]->(D01)
当前查询是这样的:
MATCH (s:Store { Name:'D01' })
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp)
WITH t.Quantity as total, collect(r) as movements
WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.Quantity) as supCount
UNWIND movements as movement
RETURN startNode(movement).Supplier as Supplier, round(100.0*movement.Quantity/supCount) as pct
我正在尝试使用递归关系,大致如下:
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO*]-(supp)
然而,这提供了到末端节点的多条路径,我需要在我认为的每个节点汇总库存。
我想不出用纯密码解决方案的方法,因为我认为你不能用密码进行这样的递归。但是,您可以使用 cypher 以简单的方式 return 树中的所有数据,以便您可以用自己喜欢的编程语言对其进行计算。像这样:
MATCH path=(source:Store)-[move:MOVE_TO*]->(target:Store {Name: 'D01'})
WHERE source.Supplier IS NOT NULL
RETURN
source.Supplier,
reduce(a=[], move IN relationships(path)| a + [{id: ID(move), Quantity: move.Quantity}])
这将为您 return 您提供每个路径上每个关系的 ID 和数量。然后您可以处理该客户端(也许首先将其转换为嵌套数据结构?)
正如我之前所说,我很喜欢这个问题。我知道你已经接受了一个答案,但我决定 post 我的最终回应,因为它也 returns 百分位数而无需客户端的努力(这意味着你也可以在节点上执行 SET 以更新中的值数据库,当你需要时),当然,如果出于任何其他原因,我可以回来 :)
这是 link 到 console example
它returns一行包含商店名称,从所有供应商转移到它的总和以及每个供应商的百分位数
MATCH p =s<-[:MOVE_TO*]-sup
WHERE HAS (sup.Supplier) AND NOT HAS (s.Supplier)
WITH s,sup,reduce(totalSupplier = 0, r IN relationships(p)| totalSupplier + r.Quantity) AS TotalAmountMoved
WITH sum(TotalAmountMoved) AS sumMoved, collect(DISTINCT ([sup.Supplier, TotalAmountMoved])) AS MyDataPart1,s
WITH reduce(b=[], c IN MyDataPart1| b +[{ Supplier: c[0], Quantity: c[1], Percentile: ((c[1]*1.00))/(sumMoved*1.00)*100.00 }]) AS MyData, s, sumMoved
RETURN s.Name, sumMoved, MyData
此查询为符合问题中描述的模型的任意图形生成正确的结果。 (当 Store
x 将商品移动到 Store
y 时,假定移动商品的 Supplier
百分比与 Store
x 相同。)
但是,此解决方案不仅仅包含单个 Cypher 查询(因为这可能是不可能的)。相反,它涉及多个查询,必须迭代其中一个查询,直到计算级联通过 Store
个节点的整个图。该迭代查询会清楚地告诉您何时停止迭代。其他 Cypher 查询需要:为迭代准备图,报告 "end" 节点的供应商百分比,并清理图(以便它恢复到步骤 1 之前的状态,下面)。
这些查询或许可以进一步优化。
这是必需的步骤:
为迭代查询准备图(为所有起始 Store
节点初始化临时 pcts
数组)。这包括创建一个单例 Suppliers
节点,该节点具有包含所有供应商名称的数组。这用于建立临时 pcts
数组元素的顺序,并将这些元素映射回正确的供应商名称。
MATCH (store:Store)
WHERE HAS (store.Supplier)
WITH COLLECT(store) AS stores, COLLECT(DISTINCT store.Supplier) AS csup
CREATE (sups:Suppliers { names: csup })
WITH stores, sups
UNWIND stores AS store
SET store.pcts =
EXTRACT(i IN RANGE(0,LENGTH(sups.names)-1,1) |
CASE WHEN store.Supplier = sups.names[i] THEN 1.0 ELSE 0.0 END)
RETURN store.Name, store.Supplier, store.pcts;
这是问题数据的结果:
+---------------------------------------------+
| store.Name | store.Supplier | store.pcts |
+---------------------------------------------+
| "A01" | "S1" | [1.0,0.0,0.0] |
| "A02" | "S1" | [1.0,0.0,0.0] |
| "A03" | "S2" | [0.0,1.0,0.0] |
| "A04" | "S3" | [0.0,0.0,1.0] |
| "A05" | "S1" | [1.0,0.0,0.0] |
| "A06" | "S1" | [1.0,0.0,0.0] |
| "A07" | "S2" | [0.0,1.0,0.0] |
| "A08" | "S3" | [0.0,0.0,1.0] |
+---------------------------------------------+
8 rows
83 ms
Nodes created: 1
Properties set: 9
迭代查询(运行重复直到返回0行)
MATCH p=(s1:Store)-[m:MOVE_TO]->(s2:Store)
WHERE HAS(s1.pcts) AND NOT HAS(s2.pcts)
SET s2.pcts = EXTRACT(i IN RANGE(1,LENGTH(s1.pcts),1) | 0)
WITH s2, COLLECT(p) AS ps
WITH s2, ps, REDUCE(s=0, p IN ps | s + HEAD(RELATIONSHIPS(p)).Quantity) AS total
FOREACH(p IN ps |
SET HEAD(RELATIONSHIPS(p)).pcts = EXTRACT(parentPct IN HEAD(NODES(p)).pcts | parentPct * HEAD(RELATIONSHIPS(p)).Quantity / total)
)
FOREACH(p IN ps |
SET s2.pcts = EXTRACT(i IN RANGE(0,LENGTH(s2.pcts)-1,1) | s2.pcts[i] + HEAD(RELATIONSHIPS(p)).pcts[i])
)
RETURN s2.Name, s2.pcts, total, EXTRACT(p IN ps | HEAD(RELATIONSHIPS(p)).pcts) AS rel_pcts;
迭代 1 结果:
+-----------------------------------------------------------------------------------------------+
| s2.Name | s2.pcts | total | rel_pcts |
+-----------------------------------------------------------------------------------------------+
| "B04" | [0.0,0.1,0.9] | 500 | [[0.0,0.1,0.0],[0.0,0.0,0.9]] |
| "B01" | [1.0,0.0,0.0] | 1250 | [[0.6,0.0,0.0],[0.4,0.0,0.0]] |
| "B03" | [1.0,0.0,0.0] | 300 | [[0.3333333333333333,0.0,0.0],[0.6666666666666666,0.0,0.0]] |
| "B02" | [0.0,0.6,0.4] | 1250 | [[0.0,0.6,0.0],[0.0,0.0,0.4]] |
+-----------------------------------------------------------------------------------------------+
4 rows
288 ms
Properties set: 24
迭代 2 结果:
+-------------------------------------------------------------------------------------------------------------------------------+
| s2.Name | s2.pcts | total | rel_pcts |
+-------------------------------------------------------------------------------------------------------------------------------+
| "C02" | [0.3333333333333333,0.06666666666666667,0.6] | 300 | [[0.3333333333333333,0.0,0.0],[0.0,0.06666666666666667,0.6]] |
| "C01" | [0.4,0.36,0.24] | 1000 | [[0.4,0.0,0.0],[0.0,0.36,0.24]] |
+-------------------------------------------------------------------------------------------------------------------------------+
2 rows
193 ms
Properties set: 12
迭代 3 结果:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| s2.Name | s2.pcts | total | rel_pcts |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "D01" | [0.38095238095238093,0.27619047619047615,0.34285714285714286] | 700 | [[0.2857142857142857,0.2571428571428571,0.17142857142857143],[0.09523809523809522,0.01904761904761905,0.17142857142857143]] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
40 ms
Properties set: 6
迭代 4 结果:
+--------------------------------------+
| s2.Name | s2.pcts | total | rel_pcts |
+--------------------------------------+
+--------------------------------------+
0 rows
69 ms
列出结束 Store
个节点的非零 Supplier
百分比。
MATCH (store:Store), (sups:Suppliers)
WHERE NOT (store:Store)-[:MOVE_TO]->(:Store) AND HAS(store.pcts)
RETURN store.Name, [i IN RANGE(0,LENGTH(sups.names)-1,1) WHERE store.pcts[i] > 0 | {supplier: sups.names[i], pct: store.pcts[i] * 100}] AS pcts;
结果:
+----------------------------------------------------------------------------------------------------------------------------------+
| store.Name | pcts |
+----------------------------------------------------------------------------------------------------------------------------------+
| "D01" | [{supplier=S1, pct=38.095238095238095},{supplier=S2, pct=27.619047619047617},{supplier=S3, pct=34.285714285714285}] |
+----------------------------------------------------------------------------------------------------------------------------------+
1 row
293 ms
清理(删除所有临时 pcts
道具和 Suppliers
节点)。
MATCH (s:Store), (sups:Suppliers)
OPTIONAL MATCH (s)-[m:MOVE_TO]-()
REMOVE m.pcts, s.pcts
DELETE sups;
结果:
0 rows
203 ms
+-------------------+
| No data returned. |
+-------------------+
Properties set: 29
Nodes deleted: 1
我问了一个问题
总而言之,我们从一堆叶子商店开始,这些商店都与特定供应商相关联,这是 Store
节点上的 属性。然后将库存转移到其他商店,每个供应商的比例对应于他们对原始商店的贡献。
因此对于节点 B02
,S2
贡献了 750/1250 = 60%
并且 S3
贡献了 40%
。然后我们将 B02
的 600 个单位移动,其中 60%
属于 S2
,40%
属于 S3
,依此类推。
我们想知道 D01
中最后 700 个单位的百分比属于每个供应商。具有相同名称的供应商是同一供应商。所以对于上图我们期望:
S1, 38.09
S2, 27.61
S3, 34.28
我已经使用这个 Cypher 脚本准备了一个图表:
CREATE (A01:Store {Name: 'A01', Supplier: 'S1'})
CREATE (A02:Store {Name: 'A02', Supplier: 'S1'})
CREATE (A03:Store {Name: 'A03', Supplier: 'S2'})
CREATE (A04:Store {Name: 'A04', Supplier: 'S3'})
CREATE (A05:Store {Name: 'A05', Supplier: 'S1'})
CREATE (A06:Store {Name: 'A06', Supplier: 'S1'})
CREATE (A07:Store {Name: 'A07', Supplier: 'S2'})
CREATE (A08:Store {Name: 'A08', Supplier: 'S3'})
CREATE (B01:Store {Name: 'B01'})
CREATE (B02:Store {Name: 'B02'})
CREATE (B03:Store {Name: 'B03'})
CREATE (B04:Store {Name: 'B04'})
CREATE (C01:Store {Name: 'C01'})
CREATE (C02:Store {Name: 'C02'})
CREATE (D01:Store {Name: 'D01'})
CREATE (A01)-[:MOVE_TO {Quantity: 750}]->(B01)
CREATE (A02)-[:MOVE_TO {Quantity: 500}]->(B01)
CREATE (A03)-[:MOVE_TO {Quantity: 750}]->(B02)
CREATE (A04)-[:MOVE_TO {Quantity: 500}]->(B02)
CREATE (A05)-[:MOVE_TO {Quantity: 100}]->(B03)
CREATE (A06)-[:MOVE_TO {Quantity: 200}]->(B03)
CREATE (A07)-[:MOVE_TO {Quantity: 50}]->(B04)
CREATE (A08)-[:MOVE_TO {Quantity: 450}]->(B04)
CREATE (B01)-[:MOVE_TO {Quantity: 400}]->(C01)
CREATE (B02)-[:MOVE_TO {Quantity: 600}]->(C01)
CREATE (B03)-[:MOVE_TO {Quantity: 100}]->(C02)
CREATE (B04)-[:MOVE_TO {Quantity: 200}]->(C02)
CREATE (C01)-[:MOVE_TO {Quantity: 500}]->(D01)
CREATE (C02)-[:MOVE_TO {Quantity: 200}]->(D01)
当前查询是这样的:
MATCH (s:Store { Name:'D01' })
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp)
WITH t.Quantity as total, collect(r) as movements
WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.Quantity) as supCount
UNWIND movements as movement
RETURN startNode(movement).Supplier as Supplier, round(100.0*movement.Quantity/supCount) as pct
我正在尝试使用递归关系,大致如下:
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO*]-(supp)
然而,这提供了到末端节点的多条路径,我需要在我认为的每个节点汇总库存。
我想不出用纯密码解决方案的方法,因为我认为你不能用密码进行这样的递归。但是,您可以使用 cypher 以简单的方式 return 树中的所有数据,以便您可以用自己喜欢的编程语言对其进行计算。像这样:
MATCH path=(source:Store)-[move:MOVE_TO*]->(target:Store {Name: 'D01'})
WHERE source.Supplier IS NOT NULL
RETURN
source.Supplier,
reduce(a=[], move IN relationships(path)| a + [{id: ID(move), Quantity: move.Quantity}])
这将为您 return 您提供每个路径上每个关系的 ID 和数量。然后您可以处理该客户端(也许首先将其转换为嵌套数据结构?)
正如我之前所说,我很喜欢这个问题。我知道你已经接受了一个答案,但我决定 post 我的最终回应,因为它也 returns 百分位数而无需客户端的努力(这意味着你也可以在节点上执行 SET 以更新中的值数据库,当你需要时),当然,如果出于任何其他原因,我可以回来 :) 这是 link 到 console example
它returns一行包含商店名称,从所有供应商转移到它的总和以及每个供应商的百分位数
MATCH p =s<-[:MOVE_TO*]-sup
WHERE HAS (sup.Supplier) AND NOT HAS (s.Supplier)
WITH s,sup,reduce(totalSupplier = 0, r IN relationships(p)| totalSupplier + r.Quantity) AS TotalAmountMoved
WITH sum(TotalAmountMoved) AS sumMoved, collect(DISTINCT ([sup.Supplier, TotalAmountMoved])) AS MyDataPart1,s
WITH reduce(b=[], c IN MyDataPart1| b +[{ Supplier: c[0], Quantity: c[1], Percentile: ((c[1]*1.00))/(sumMoved*1.00)*100.00 }]) AS MyData, s, sumMoved
RETURN s.Name, sumMoved, MyData
此查询为符合问题中描述的模型的任意图形生成正确的结果。 (当 Store
x 将商品移动到 Store
y 时,假定移动商品的 Supplier
百分比与 Store
x 相同。)
但是,此解决方案不仅仅包含单个 Cypher 查询(因为这可能是不可能的)。相反,它涉及多个查询,必须迭代其中一个查询,直到计算级联通过 Store
个节点的整个图。该迭代查询会清楚地告诉您何时停止迭代。其他 Cypher 查询需要:为迭代准备图,报告 "end" 节点的供应商百分比,并清理图(以便它恢复到步骤 1 之前的状态,下面)。
这些查询或许可以进一步优化。
这是必需的步骤:
为迭代查询准备图(为所有起始
Store
节点初始化临时pcts
数组)。这包括创建一个单例Suppliers
节点,该节点具有包含所有供应商名称的数组。这用于建立临时pcts
数组元素的顺序,并将这些元素映射回正确的供应商名称。MATCH (store:Store) WHERE HAS (store.Supplier) WITH COLLECT(store) AS stores, COLLECT(DISTINCT store.Supplier) AS csup CREATE (sups:Suppliers { names: csup }) WITH stores, sups UNWIND stores AS store SET store.pcts = EXTRACT(i IN RANGE(0,LENGTH(sups.names)-1,1) | CASE WHEN store.Supplier = sups.names[i] THEN 1.0 ELSE 0.0 END) RETURN store.Name, store.Supplier, store.pcts;
这是问题数据的结果:
+---------------------------------------------+ | store.Name | store.Supplier | store.pcts | +---------------------------------------------+ | "A01" | "S1" | [1.0,0.0,0.0] | | "A02" | "S1" | [1.0,0.0,0.0] | | "A03" | "S2" | [0.0,1.0,0.0] | | "A04" | "S3" | [0.0,0.0,1.0] | | "A05" | "S1" | [1.0,0.0,0.0] | | "A06" | "S1" | [1.0,0.0,0.0] | | "A07" | "S2" | [0.0,1.0,0.0] | | "A08" | "S3" | [0.0,0.0,1.0] | +---------------------------------------------+ 8 rows 83 ms Nodes created: 1 Properties set: 9
迭代查询(运行重复直到返回0行)
MATCH p=(s1:Store)-[m:MOVE_TO]->(s2:Store) WHERE HAS(s1.pcts) AND NOT HAS(s2.pcts) SET s2.pcts = EXTRACT(i IN RANGE(1,LENGTH(s1.pcts),1) | 0) WITH s2, COLLECT(p) AS ps WITH s2, ps, REDUCE(s=0, p IN ps | s + HEAD(RELATIONSHIPS(p)).Quantity) AS total FOREACH(p IN ps | SET HEAD(RELATIONSHIPS(p)).pcts = EXTRACT(parentPct IN HEAD(NODES(p)).pcts | parentPct * HEAD(RELATIONSHIPS(p)).Quantity / total) ) FOREACH(p IN ps | SET s2.pcts = EXTRACT(i IN RANGE(0,LENGTH(s2.pcts)-1,1) | s2.pcts[i] + HEAD(RELATIONSHIPS(p)).pcts[i]) ) RETURN s2.Name, s2.pcts, total, EXTRACT(p IN ps | HEAD(RELATIONSHIPS(p)).pcts) AS rel_pcts;
迭代 1 结果:
+-----------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +-----------------------------------------------------------------------------------------------+ | "B04" | [0.0,0.1,0.9] | 500 | [[0.0,0.1,0.0],[0.0,0.0,0.9]] | | "B01" | [1.0,0.0,0.0] | 1250 | [[0.6,0.0,0.0],[0.4,0.0,0.0]] | | "B03" | [1.0,0.0,0.0] | 300 | [[0.3333333333333333,0.0,0.0],[0.6666666666666666,0.0,0.0]] | | "B02" | [0.0,0.6,0.4] | 1250 | [[0.0,0.6,0.0],[0.0,0.0,0.4]] | +-----------------------------------------------------------------------------------------------+ 4 rows 288 ms Properties set: 24
迭代 2 结果:
+-------------------------------------------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +-------------------------------------------------------------------------------------------------------------------------------+ | "C02" | [0.3333333333333333,0.06666666666666667,0.6] | 300 | [[0.3333333333333333,0.0,0.0],[0.0,0.06666666666666667,0.6]] | | "C01" | [0.4,0.36,0.24] | 1000 | [[0.4,0.0,0.0],[0.0,0.36,0.24]] | +-------------------------------------------------------------------------------------------------------------------------------+ 2 rows 193 ms Properties set: 12
迭代 3 结果:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "D01" | [0.38095238095238093,0.27619047619047615,0.34285714285714286] | 700 | [[0.2857142857142857,0.2571428571428571,0.17142857142857143],[0.09523809523809522,0.01904761904761905,0.17142857142857143]] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row 40 ms Properties set: 6
迭代 4 结果:
+--------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +--------------------------------------+ +--------------------------------------+ 0 rows 69 ms
列出结束
Store
个节点的非零Supplier
百分比。MATCH (store:Store), (sups:Suppliers) WHERE NOT (store:Store)-[:MOVE_TO]->(:Store) AND HAS(store.pcts) RETURN store.Name, [i IN RANGE(0,LENGTH(sups.names)-1,1) WHERE store.pcts[i] > 0 | {supplier: sups.names[i], pct: store.pcts[i] * 100}] AS pcts;
结果:
+----------------------------------------------------------------------------------------------------------------------------------+ | store.Name | pcts | +----------------------------------------------------------------------------------------------------------------------------------+ | "D01" | [{supplier=S1, pct=38.095238095238095},{supplier=S2, pct=27.619047619047617},{supplier=S3, pct=34.285714285714285}] | +----------------------------------------------------------------------------------------------------------------------------------+ 1 row 293 ms
清理(删除所有临时
pcts
道具和Suppliers
节点)。MATCH (s:Store), (sups:Suppliers) OPTIONAL MATCH (s)-[m:MOVE_TO]-() REMOVE m.pcts, s.pcts DELETE sups;
结果:
0 rows 203 ms +-------------------+ | No data returned. | +-------------------+ Properties set: 29 Nodes deleted: 1