Neo4j 中的建模图显示工作流和影响

Modelling graph in Neo4j showing workflow and impact

Neo4j 的新手,但可以看到图形数据库中的许多可能性,特别是 IT 数据工作流和系统影响。但不确定能获得最高效率的正确设计。

考虑一个接收文件、处理它们、将它们存储在数据库中并使数据在各种报告中可用的系统。但是,根据文件的不同,数据可能在一个报告中,但不在另一个报告中。

System Architecture and Reality

一个重要的用例是能够报告上游文件丢失或处理这些文件的组件失败时对下游报告的影响。

Test Cases

我想出了 4 个设计,其中 3 个似乎可行,但不确定哪个是最好的。

Design 1

Design 2

Design 3

Design 4

非常感谢对此的任何帮助或建议。

使用的代码:

---------------------------------------------------------------------------
-- Design Experiments
---------------------------------------------------------------------------

// 1. Combination of the Workflows with shared nodes where they interact
      with same Process or DataStore
---------------------------------------------------------------------------

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r

CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)

// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[r*]->(rp:Report) RETURN rp

// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[r*]->(rp:Report) RETURN rp


// 2. Same node relationship design as #1, but assign a workflow property
      to each node and relationship as a property array
---------------------------------------------------------------------------

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r

CREATE (p1:Provider {name: "Provider 1", workflow: ["workflow1","workflow2"]})
CREATE (p2:Provider {name: "Provider 2", workflow: ["workflow3"]})
CREATE (f1:File {name: "File 1", workflow: ["workflow1"]})
CREATE (f2:File {name: "File 2", workflow: ["workflow2"]})
CREATE (f3:File {name: "File 3", workflow: ["workflow3"]})
CREATE (pp:PreProcess {name: "PreProcess", workflow: ["workflow1"]})
CREATE (p:Process {name: "Process", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (d:DataStore {name: "DataStore", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (rA:Report {name: "Report A", workflow: ["workflow1","workflow3"]})
CREATE (rB:Report {name: "Report B", workflow: ["workflow2"]})
CREATE (p1)-[:PROVIDES{workflow: ["workflow1"]}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: ["workflow2"]}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: ["workflow3"]}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: ["workflow3"]}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: ["workflow1","workflow2","workflow3"]}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow1","workflow3"]}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(rB)

// Show individual workflows
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow1") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow2") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow3") RETURN p

// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"}) WITH a.workflow AS workflows 
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r

// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"}) WITH a.workflow AS workflows 
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r


// 3. Same node relationship design as #1, but create a relationship
      with a workflow property for each workflow, resulting in multiple
      relatinships between nodes.
---------------------------------------------------------------------------

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r

CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: "workflow1"}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow1"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow2"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow3"}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow3"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow2"}]->(rB)


// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp

// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp


// 4. Distinct set of nodes and relationships for each workflow, but all
      with same node type so can still be matched
---------------------------------------------------------------------------

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r

CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 1"})
CREATE (p3:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp1:PreProcess {name: "PreProcess"})
CREATE (pc1:Process {name: "Process"})
CREATE (pc2:Process {name: "Process"})
CREATE (pc3:Process {name: "Process"})
CREATE (d1:DataStore {name: "DataStore"})
CREATE (d2:DataStore {name: "DataStore"})
CREATE (d3:DataStore {name: "DataStore"})
CREATE (rA1:Report {name: "Report A"})
CREATE (rB2:Report {name: "Report B"})
CREATE (rA3:Report {name: "Report A"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p2)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p3)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp1)
CREATE (pp1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pc1)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(pc2)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(pc3)
CREATE (pc1)-[:DELIVERS_TO{workflow: "workflow1"}]->(d1)
CREATE (pc2)-[:DELIVERS_TO{workflow: "workflow2"}]->(d2)
CREATE (pc3)-[:DELIVERS_TO{workflow: "workflow3"}]->(d3)
CREATE (d1)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA1)
CREATE (d2)-[:DELIVERS_TO{workflow: "workflow3"}]->(rB2)
CREATE (d3)-[:DELIVERS_TO{workflow: "workflow2"}]->(rA3)


// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j*]->(rp:Report) RETURN rp

// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j*]->(rp:Report) RETURN rp

根据建议,已扩展设计 1 以在文件和报表之间包含直接 link。

Design 1a

// 1a. Combination of the Workflows with shared nodes where they interact
   with same Process or DataStore. 
---------------------------------------------------------------------------

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r

CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
CREATE (f1)-[:USED_BY{}]->(rA)
CREATE (f2)-[:USED_BY{}]->(rB)
CREATE (f3)-[:USED_BY{}]->(rA)

// Show impacted reports (and path) if Provider 1 is down
MATCH path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report

// Show impacted reports (and path) if Provider 2 is down
MATCH path = (:Provider{name:'Provider 2'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report

您已在此处进行了彻底的探索,您已找到适合您的查询的设计。然而,他们需要付出代价。

设计 2 根本不使用关系,因此解决方案看起来不是很形象。它还要求您确保相关节点上的工作流列表保持同步和最新。那个维护成本好像比较高

设计 3 具有类似的成本,但现在属性在关系上,并且您还必须在整个模型中提供冗余关系,因此成本更高。

设计 4 需要流程中每个使用步骤的冗余,其中每个子图都是从提供者到报告的单一路径。虽然这很容易理解和查询,但冗余节点和关系可能不是可行的方法。

设计 1 的有趣之处在于它提供了正确答案,但仅针对某些问题...有关路径中处理器、预处理器和数据存储的影响的问题,以及当这些硬件和软件组件出现故障时会发生什么情况。

但是它不适用于数据 lineage/dependence。还没有。您可能需要考虑更改设计 1,以便有单独的路径来考虑数据依赖性与您已经拥有的管道过程。

数据依赖可能是另一回事。如果您要问这方面的问题,那么您最关心的是输入和输出、文件到报告。在这种情况下,您可能会考虑在相关文件和报告节点之间创建一个 :DEPENDS_ON 关系。

考虑将此添加到设计 1 的创建脚本的末尾:

match (f:File), (r:Report{name:'Report A'})
where f.name in ['File 1', 'File 3']
create (r)<-[:USED_BY]-(f)

match (f:File), (r:Report{name:'Report B'})
where f.name in ['File 2']
create (r)<-[:USED_BY]-(f)

关于数据沿袭的问题,您的查询只能使用相关关系,在本例中为:PROVIDES 和:USED_BY。

match path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
return path, r.name as report

反之,报告的来源是什么?

match path = (p:Provider)-[:PROVIDES|USED_BY*]->(r:Report{name:'Report A')
return path, p.name as report

并且如果您的模型发生变化以便对中间报告进行建模(预处理和处理操作的输出),那么您可以创建 :USED_BY 与从 :File 到 :Report 的链中的关系(而不是直接在 :File 和 :Report 之间)所以你会在处理过程中看到依赖链。