加速 GraphDB 上的 SPARQL 查询

Question

我正在尝试加快和优化此查询

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node ;
          :hasnode* ?node2 .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .

    ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .

}

基本上，我正在分析一些树，我想得到所有至少有几个底层节点的树（即树的根），其模式如下：

?node_x a :Node ;
       :hasAnnotation ?ann_x .
?ann_x :hasReference ?ref_x .
?ref_x a :ReferenceTypex .

一个 x = 1，另一个 x = 2。

因为在我的图中一个节点最多可以有一个 :hasAnnotation 谓词，所以我不必指定这些节点必须不同。

问题

上述查询描述了我所需要的，但性能很差。分分钟执行，还是运行.

我的（丑陋的）解决方案：将其分成两半

我注意到如果一次查找节点模式，我会在几秒钟内得到结果 (!)。

遗憾的是，我目前的方法是运行以下查询类型两次：

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann_x .
    ?ann_x :hasReference ?ref_x .
    ?ref_x a :ReferenceTypex .
}

一个 x = 1，另一个 x = 2。

在 2 个集合中保存部分结果（即 ?roots），比方说 R1 和 R2，最后计算这些结果集之间的交集。

有没有一种方法可以加快我最初通过利用 SPARQL 获得结果的方法？

PS：我正在使用 GraphDB。

Answer 1

好吧，结合自动提示 :) 和 Stanislav 的建议我想出了一个解决方案。

解决方案一嵌套查询

按以下方式嵌套查询，我在 15s 中得到结果。

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .
    ?node a :Node ;
          :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    {
        select distinct ?root where { 
            ?root a :Root ;
                  :hasnode* ?node2 .
            ?node2 a :Node ;
                   :hasAnnotation ?ann2 .
            ?ann2 :hasReference ?ref2 .
            ?ref2 a :ReferenceType2 .
        }
    }
}

解决方案 2：分组为 {}

按照 Stanislav 的建议，将部分分组到 {}，花费了 60s。

select distinct ?root where { 
    {
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    }
    {
        ?root a :Root ;
          :hasnode* ?node2 .

              ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .
    }
}

在第一种情况下，GraphDB 的优化器可能会为我的数据构建更有效的查询计划（欢迎解释）。

我曾经以 'declarative' 的方式考虑过 SPARQL，但就您编写 SPARQL 的方式而言，性能似乎存在巨大差异。来自 SQL，在我看来，这种性能可变性比它在关系世界中发生的要大得多。

但是，阅读 this post，我似乎对 SPARQL 优化器动态了解不够。 :)

Answer 2

在不知道具体数据集的情况下，我只能给你一些关于如何优化查询的一般指导：

避免对大型数据集使用 DISTINCT

GraphDB 查询优化器不会自动重写查询以对所有不参与投影的模式使用 EXISTS。查询语义是找到至少有一个这样的模式，但不给我所有的绑定，然后消除重复的结果。

实现属性路径

GraphDB 有一个非常有效的前向链接推理器并且相对没有那么优化属性路径扩展。如果您不关心 write/data 更新性能，我建议您将 :hasNode 声明为可传递的属性（参见 owl:TransitiveProperty in query），这将消除属性路径通配符。这将提高许多倍的查询速度。

您的最终查询应如下所示：

select ?root where { 
    ?root a :Root ;
          :hasnode ?node ;
          :hasnode ?node2 .

    FILTER (?node != ?node2)

    FILTER EXISTS {
        ?node a :Node ;
               :hasAnnotation ?ann .
        ?ann :hasReference ?ref .
        ?ref a :ReferenceType1 .
    }

    FILTER EXISTS {
        ?node2 a :Node ;
                :hasAnnotation ?ann2 .
        ?ann2 :hasReference ?ref2 .
        ?ref2 a :ReferenceType2 .
    }
}

加速 GraphDB 上的 SPARQL 查询

Speeding up SPARQL query on GraphDB

rdf

semantic-web

sparql

graphdb