如何在 Gremlin 中查找同一天参加同一研讨会的人员的边缘列表？

Question

我想创建一个边缘列表来显示连接和连接强度。此示例图表包含 4 个人以及他们参加研讨会 A 和 B 的信息，包括参加的日期和他们停留的小时数。我想通过工作坊节点建立联系，如果两个人在同一天参加同一个工作坊，我会认为他们是联系在一起的，联系强度将是在工作坊上花费的最少小时数。

这是示例图：

g.addV('person').property(id, '1').property('name', 'Alice').next()
g.addV('person').property(id, '2').property('name', 'Bob').next()
g.addV('person').property(id, '3').property('name', 'Carol').next()
g.addV('person').property(id, '4').property('name', 'David').next()
g.addV('workshop').property(id, '5').property('name', 'A').next()
g.addV('workshop').property(id, '6').property('name', 'B')

g.V('1').addE('attended').to(g.V('5')).property('hours', 2).property('day', 'Monday').next()
g.V('1').addE('attended').to(g.V('6')).property('hours', 2).property('day', 'Monday').next()
g.V('2').addE('attended').to(g.V('5')).property('hours', 5).property('day', 'Monday').next()
g.V('3').addE('attended').to(g.V('6')).property('hours', 5).property('day', 'Monday').next()
g.V('4').addE('attended').to(g.V('5')).property('hours', 4).property('day', 'Tuesday').next()
g.V('4').addE('attended').to(g.V('6')).property('hours', 4).property('day', 'Monday').next()
g.V('2').addE('attended').to(g.V('6')).property('hours', 1).property('day', 'Monday')

这将是第 1 步，显示在同一天参加研讨会的每一对学员在每个研讨会上的最少时数：

请注意，David 与研讨会 A 没有任何联系，因为他与 Alice 和 Bob 参加的日期不同。

然后我们可以通过将每一对的跨研讨会的时间相加来找到关系的总强度（现在 Alice 和 Bob 总共有 3 小时，这是跨研讨会 A 和 B）：

我正在为如何使用 Gremlin 在 Neptune 图中编写它而苦苦挣扎。我对 Cypher 比较熟悉，可以使用类似这样的方法找到这种类型的边缘列表：

match (p:Person)-[a:ATTENDED]->(w:Workshop)<-[a2:ATTENDED]-(other:Person)
where a.day = a2.day
and p.name <> other.name
unwind [a.hours, a2.hours] as hrs
with p, w, other, a, min(hrs) as hrs
return a.name, other.name, sum(hrs) as total_hours

这是我对 Gremlin 的了解，但我不确定如何完成总结：

g.V().
    hasLabel('person').as('p').
    outE().as('e').
    inV().as('ws').
    inE('attended').
    where(eq('e')).by('day').as('e2').
    otherV().
    where(neq('p')).as('other').
    select('p','e','other','e2','ws').
    by(valueMap('name','hours','day'))

有人能帮忙吗？

Answer 1

如果有更多时间，我相当确定可以简化查询。但是，鉴于您到目前为止所处的位置，我们可以提取每个人的详细信息：

g.V().
    hasLabel('person').as('p').
    outE().as('e').
    inV().as('ws').
    inE('attended').
    where(eq('e')).by('day').as('e2').
    otherV().
    where(neq('p')).as('other').
    select('p','e','other','e2','ws').
    by(valueMap('name','hours','day').
      by(unfold())).
    project('p1','p2','shared').
      by(select('p').select('name')).
      by(select('other').select('name')).
      by(union(select('e').select('hours'),
               select('e2').select('hours')).min())

这给了我们每个人在一起度过的时间，但还不是总计

==>[p1:Alice,p2:Bob,shared:2]
==>[p1:Alice,p2:Carol,shared:2]
==>[p1:Alice,p2:David,shared:2]
==>[p1:Alice,p2:Bob,shared:1]
==>[p1:Bob,p2:Alice,shared:2]
==>[p1:Bob,p2:Alice,shared:1]
==>[p1:Bob,p2:Carol,shared:1]
==>[p1:Bob,p2:David,shared:1]
==>[p1:Carol,p2:Alice,shared:2]
==>[p1:Carol,p2:David,shared:4]
==>[p1:Carol,p2:Bob,shared:1]
==>[p1:David,p2:Alice,shared:2]
==>[p1:David,p2:Carol,shared:4]
==>[p1:David,p2:Bob,shared:1]

剩下的就是产生最终结果了。一种方法是使用 group 步骤。

gremlin> g.V().
......1>     hasLabel('person').as('p').
......2>     outE().as('e').
......3>     inV().as('ws').
......4>     inE('attended').
......5>     where(eq('e')).by('day').as('e2').
......6>     otherV().
......7>     where(neq('p')).as('other').
......8>     select('p','e','other','e2','ws').
......9>     by(valueMap('name','hours','day').
.....10>       by(unfold())).
.....11>     project('p1','p2','shared').
.....12>       by(select('p').select('name')).
.....13>       by(select('other').select('name')).
.....14>       by(union(select('e').select('hours'),
.....15>                select('e2').select('hours')).min()).
.....16>     group().
.....17>       by(union(select('p1'),select('p2')).fold()).
.....18>       by(select('shared').sum())  

==>[[Bob,Carol]:1,[David,Alice]:2,[Carol,Alice]:2,[Carol,Bob]:1,[Alice,Bob]:3,[Carol,David]:4,[Bob,Alice]:3,
[David,Bob]:1,[Bob,David]:1,[David,Carol]:4,[Alice,Carol]:2,[Alice,David]:2]

添加 unfold 可使结果更易于阅读。对于 Bob-Alice 和 Alice-Bob，我没有尝试排除重复项。如果您需要在查询中执行此操作，可以在创建 group 之后添加 order 步骤，然后使用 dedup。

gremlin> g.V().
......1>     hasLabel('person').as('p').
......2>     outE().as('e').
......3>     inV().as('ws').
......4>     inE('attended').
......5>     where(eq('e')).by('day').as('e2').
......6>     otherV().
......7>     where(neq('p')).as('other').
......8>     select('p','e','other','e2','ws').
......9>     by(valueMap('name','hours','day').
.....10>       by(unfold())).
.....11>     project('p1','p2','shared').
.....12>       by(select('p').select('name')).
.....13>       by(select('other').select('name')).
.....14>       by(union(select('e').select('hours'),
.....15>                select('e2').select('hours')).min()).
.....16>     group().
.....17>       by(union(select('p1'),select('p2')).fold()).
.....18>       by(select('shared').sum()).
.....19>     unfold()

==>[Bob, Carol]=1
==>[David, Alice]=2
==>[Carol, Alice]=2
==>[Carol, Bob]=1
==>[Alice, Bob]=3
==>[Carol, David]=4
==>[Bob, Alice]=3
==>[David, Bob]=1
==>[Bob, David]=1
==>[David, Carol]=4
==>[Alice, Carol]=2
==>[Alice, David]=2

如何在 Gremlin 中查找同一天参加同一研讨会的人员的边缘列表？

How to find edge lists of people who attended the same workshop on the same day in Gremlin?

gremlin

amazon-neptune