有没有办法重用聚合步骤？

Question

我有一个存储不同类型实体的图形数据库，我正在构建一个 API 来从图形中获取实体。然而，它有点复杂，因为对于每种类型的实体，都有一组规则用于获取相关实体以及原始实体。为此，我使用聚合步骤将我正在获取的所有相关实体聚合到一个集合中。

一个额外的要求是获取一批实体（和相关实体）。我打算通过更改获取实体的 has 步骤来使用 P.within 并将聚合映射到每个找到的实体来做到这一点。如果我继续获取单个实体，这会起作用，但如果我想获取两个实体，那么我的结果集对于第一个实体是正确的，但第二个实体的结果集包含第一个实体的结果及其自身的结果。我认为这是因为第二个将简单地添加到第一个的聚合集合中，因为聚合键是相同的。我还没有找到任何方法来清除第一个和第二个之间的集合，也没有找到任何具有动态聚合副作用键的方法。

代码：


return graph.traversal().V()
    .hasLabel(ENTITY_LABEL)
    .has("entity_ref", P.within(entityRefs)) // entityRefs is a list of entities I am looking for
    .flatMap(
        __.aggregate("match")
          .sideEffect(
                // The logic that applies the rules lives here. It will add items to "match" collection.
          )
          .select("match")
          .fold()
    )
    .toStream()
    ...

结果应为实体列表列表，其中外部列表中的第一个实体列表包含 entityRefs 中第一个实体的结果，第二个实体列表包含第二个实体的结果在 entityRefs.

示例：我想获取实体引用 A 和 B 及其相关实体的顶点。假设我希望结果为 [[A, C], [B, D, E]]，但我得到的结果是 [[A, C], [A, C, B, D, E]]（第二个结果包含第一个结果）。

问题：

有没有办法在选择后清除“匹配”集合？
有没有办法拥有动态副作用键，以便我为每个 entityRef 创建一个集合？
是否有其他方法可以做到这一点？
我是不是看错了问题？

编辑：这是一个问题的缩影版示例。该图的设置如下：

g.addV('entity').property('id',1).property('type', 'customer').as('1').
  addV('entity').property('id',2).property('type', 'email').as('2').
  addV('entity').property('id',6).property('type', 'customer').as('6').
  addV('entity').property('id',3).property('type', 'product').as('3').
  addV('entity').property('id',4).property('type', 'subLocation').as('4').
  addV('entity').property('id',7).property('type', 'location').as('7').
  addV('entity').property('id',5).property('type', 'productGroup').as('5').
  addE('AKA').from('1').to('2').
  addE('AKA').from('2').to('6').
  addE('HOSTED_AT').from('3').to('4').
  addE('LOCATED_AT').from('4').to('7').
  addE('PART_OF').from('3').to('5').iterate()

我想获取一批实体，给定它们的 ID 并获取相关实体。应返回哪些相关实体是原始实体类型的函数。我当前的查询是这样的（为此示例略作修改）：

g.V().
    hasLabel('entity').
    has('id', P.within(1,3)).
    flatMap(
        aggregate('match').
        sideEffect(
            choose(values('type')).
                option('customer',
                    both('AKA').
                        has('type', P.within('email', 'phone')).
                        sideEffect(
                            has('type', 'email').
                                aggregate('match')).
                        both('AKA').
                        has('type', 'customer').
                        aggregate('match')).
                option('product',
                    bothE('HOSTED_AT', 'PART_OF').
                        choose(label()).
                        option('PART_OF',
                            bothV().
                                has('type', P.eq('productGroup')).
                                aggregate('match')).
                        option('HOSTED_AT',
                            bothV().
                                has('type', P.eq('subLocation')).
                                aggregate('match').
                                both('LOCATED_AT').
                                has('type', P.eq('location')).
                                aggregate('match')))
        ).
        select('match').
        unfold().
        dedup().
        values('id').
        fold()
    ).
    toList()

如果我只获取一个实体，我会得到正确的结果。对于 id: 1，我得到 [1,2,6]，对于 id: 3，我得到 [3,5,4,7]。但是，当我同时获取两者时，我得到：

==>[3,5,4,7]
==>[3,5,4,7,1,2,6]

第一个结果是正确的，但第二个包含两个 ID 的结果。

Answer 1

您可以利用（没有很好的记录，说实话但看似强大的遍历步骤）group().by(key).by(value)。

这样您就可以放弃给您带来麻烦的 aggregate() 副作用步骤。作为将匹配某些遍历的多个顶点收集到列表中的替代方法，我使用 union().

一个使用您发布的图表的示例（为简洁起见，我只包括客户选项）：

g.V().
    hasLabel('entity').
    has('id', P.within(1,3)).
    <String, List<Entity>>group()
      .by("id")
      .by(choose(values("type"))
        .option('customer', union(
          identity(),
          both('AKA').has('type', 'email'),
          both('AKA').has('type', within('email', 'phone')).both('AKA').has('type', 'customer'))
          .map((traversal) -> new Entity(traversal.get())) //Or whatever business class you have 
          .fold() //This is important to collect all 3 paths in the union together
        .option('product', union()))
     .next()

这种遍历的明显缺点是代码有点冗长。它声明它将两次跨过客户的 'AKA'。你的遍历只声明了一次。

但是，它确实使 group() 步骤的 by(value) 部分在不同的键之间分开。这就是我们想要的。

有没有办法重用聚合步骤？

Is there a way to reuse aggregate steps?

gremlin

tinkerpop