根据一跳相邻节点的属性计算频率嵌入

Calculate frequency embedding according to an attribute of one-hop neighbouring nodes

我有如下图表。

有两种类型的节点:AB。 标记为 A 的节点有 属性 id。标记为 B 的节点有 属性 State。假设State只能属于[good, bad, average].

如何使用密码为节点 A 生成频率嵌入?例如,A1 应该有属性 embedding = [2,0,0]A2 应该有 embedding = [2,0,1]?

一种方法是:

Match(n:A)-->(t:B)
WITH collect(t.state) AS state, n
WITH [val IN state WHERE val = "good"] AS good, [val IN state WHERE val = "average"] AS avg, [val IN state WHERE val = "bad"] AS bad, n
WITH size(good) as goodCount, size(avg) as avgCount, size(bad) as badCount, n
SET n.embedding = [goodCount, badCount, avgCount]
return n.key, n.embedding

这会找到所有 A 及其 B,将 B 的状态收集到一个列表中,然后为每个状态创建不同的列表。接下来我们根据顺序获取每个状态列表的大小和 SETembedding 值。最后一部分是 return Aembedding

你可以在这个示例数据上查看:

MERGE (a:A{key: 1})
MERGE (b:A{key: 2})
MERGE (c:B{key: 3, state: 'good'})
MERGE (d:B{key: 4, state: 'good'})
MERGE (e:B{key: 5, state: 'average'})
MERGE (f:B{key: 6, state: 'good'})
MERGE (g:B{key: 7, state: 'good'})

MERGE (a)-[:HAS]-(c)
MERGE (a)-[:HAS]-(d)
MERGE (b)-[:HAS]-(e)
MERGE (b)-[:HAS]-(f)
MERGE (b)-[:HAS]-(g)
  • 使用 key 而不是 id

它returns:

╒═══════╤═════════════╕
│"n.key"│"n.embedding"│
╞═══════╪═════════════╡
│1      │[2,0,0]      │
├───────┼─────────────┤
│2      │[2,0,1]      │
└───────┴─────────────┘

如果你有很多状态选项,你可以这样做:

MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(["good", "bad", "average"], t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
    WHEN SIZE(s) = 0 THEN sublist
    ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
    END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding

灵感来自

要将其扩展到更多状态类别,我们可以在 Nimrod 的解决方案的基础上进行构建。

使用您的类别设置引用节点集:

create (n1:ref{name:'good',order:1})
create (n2:ref{name:'bad',order:2})
create (n3:ref{name:'average',order:3})

则查询变为

match (c:ref) 
with c order by c.order
with collect(c.name) as cn
MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(cn, t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
    WHEN SIZE(s) = 0 THEN sublist
    ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
    END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding

或者您可以开始查询

with ['good','bad', 'average'] as cn

无论哪种情况,您都可以根据需要添加任意数量的类别