根据一跳相邻节点的属性计算频率嵌入
Calculate frequency embedding according to an attribute of one-hop neighbouring nodes
我有如下图表。
有两种类型的节点:A
和 B
。
标记为 A 的节点有 属性 id
。标记为 B 的节点有 属性 State
。假设State
只能属于[good, bad, average]
.
如何使用密码为节点 A 生成频率嵌入?例如,A1
应该有属性 embedding = [2,0,0]
而 A2
应该有 embedding = [2,0,1]
?
一种方法是:
Match(n:A)-->(t:B)
WITH collect(t.state) AS state, n
WITH [val IN state WHERE val = "good"] AS good, [val IN state WHERE val = "average"] AS avg, [val IN state WHERE val = "bad"] AS bad, n
WITH size(good) as goodCount, size(avg) as avgCount, size(bad) as badCount, n
SET n.embedding = [goodCount, badCount, avgCount]
return n.key, n.embedding
这会找到所有 A
及其 B
,将 B
的状态收集到一个列表中,然后为每个状态创建不同的列表。接下来我们根据顺序获取每个状态列表的大小和 SET
的 embedding
值。最后一部分是 return A
的 embedding
。
你可以在这个示例数据上查看:
MERGE (a:A{key: 1})
MERGE (b:A{key: 2})
MERGE (c:B{key: 3, state: 'good'})
MERGE (d:B{key: 4, state: 'good'})
MERGE (e:B{key: 5, state: 'average'})
MERGE (f:B{key: 6, state: 'good'})
MERGE (g:B{key: 7, state: 'good'})
MERGE (a)-[:HAS]-(c)
MERGE (a)-[:HAS]-(d)
MERGE (b)-[:HAS]-(e)
MERGE (b)-[:HAS]-(f)
MERGE (b)-[:HAS]-(g)
- 使用
key
而不是 id
。
它returns:
╒═══════╤═════════════╕
│"n.key"│"n.embedding"│
╞═══════╪═════════════╡
│1 │[2,0,0] │
├───────┼─────────────┤
│2 │[2,0,1] │
└───────┴─────────────┘
如果你有很多状态选项,你可以这样做:
MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(["good", "bad", "average"], t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
WHEN SIZE(s) = 0 THEN sublist
ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding
灵感来自
要将其扩展到更多状态类别,我们可以在 Nimrod 的解决方案的基础上进行构建。
使用您的类别设置引用节点集:
create (n1:ref{name:'good',order:1})
create (n2:ref{name:'bad',order:2})
create (n3:ref{name:'average',order:3})
则查询变为
match (c:ref)
with c order by c.order
with collect(c.name) as cn
MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(cn, t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
WHEN SIZE(s) = 0 THEN sublist
ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding
或者您可以开始查询
with ['good','bad', 'average'] as cn
无论哪种情况,您都可以根据需要添加任意数量的类别
我有如下图表。
有两种类型的节点:A
和 B
。
标记为 A 的节点有 属性 id
。标记为 B 的节点有 属性 State
。假设State
只能属于[good, bad, average]
.
如何使用密码为节点 A 生成频率嵌入?例如,A1
应该有属性 embedding = [2,0,0]
而 A2
应该有 embedding = [2,0,1]
?
一种方法是:
Match(n:A)-->(t:B)
WITH collect(t.state) AS state, n
WITH [val IN state WHERE val = "good"] AS good, [val IN state WHERE val = "average"] AS avg, [val IN state WHERE val = "bad"] AS bad, n
WITH size(good) as goodCount, size(avg) as avgCount, size(bad) as badCount, n
SET n.embedding = [goodCount, badCount, avgCount]
return n.key, n.embedding
这会找到所有 A
及其 B
,将 B
的状态收集到一个列表中,然后为每个状态创建不同的列表。接下来我们根据顺序获取每个状态列表的大小和 SET
的 embedding
值。最后一部分是 return A
的 embedding
。
你可以在这个示例数据上查看:
MERGE (a:A{key: 1})
MERGE (b:A{key: 2})
MERGE (c:B{key: 3, state: 'good'})
MERGE (d:B{key: 4, state: 'good'})
MERGE (e:B{key: 5, state: 'average'})
MERGE (f:B{key: 6, state: 'good'})
MERGE (g:B{key: 7, state: 'good'})
MERGE (a)-[:HAS]-(c)
MERGE (a)-[:HAS]-(d)
MERGE (b)-[:HAS]-(e)
MERGE (b)-[:HAS]-(f)
MERGE (b)-[:HAS]-(g)
- 使用
key
而不是id
。
它returns:
╒═══════╤═════════════╕
│"n.key"│"n.embedding"│
╞═══════╪═════════════╡
│1 │[2,0,0] │
├───────┼─────────────┤
│2 │[2,0,1] │
└───────┴─────────────┘
如果你有很多状态选项,你可以这样做:
MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(["good", "bad", "average"], t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
WHEN SIZE(s) = 0 THEN sublist
ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding
灵感来自
要将其扩展到更多状态类别,我们可以在 Nimrod 的解决方案的基础上进行构建。
使用您的类别设置引用节点集:
create (n1:ref{name:'good',order:1})
create (n2:ref{name:'bad',order:2})
create (n3:ref{name:'average',order:3})
则查询变为
match (c:ref)
with c order by c.order
with collect(c.name) as cn
MATCH(n:A)-->(t:B)
WITH apoc.coll.indexOf(cn, t.state) as inx, n, [0,0,0] as k
WITH apoc.coll.set(k, inx, 1) AS k, n
WITH collect(k) as kk, n
WITH REDUCE(s = [], sublist IN kk | CASE
WHEN SIZE(s) = 0 THEN sublist
ELSE [i IN RANGE(0, SIZE(s)-1) | s[i] + sublist[i]]
END) AS result, n
SET n.embedding = result
RETURN n.key, n.embedding
或者您可以开始查询
with ['good','bad', 'average'] as cn
无论哪种情况,您都可以根据需要添加任意数量的类别