如何删除 sparql 查询中的重复项

Question

我写了这个查询和 return 夫妻名单和特定条件。（在 http://live.dbpedia.org/sparql）

SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
    select DISTINCT ?actor ?person2 (count (?film) as ?cnt) 
    where { 
        ?film    dbo:starring ?actor .
        ?actor dbo:spouse ?person2. 
        ?film    dbo:starring ?person2.
    }
    order by ?actor
}
FILTER (?cnt >9)
}

问题是有些行是重复的。示例：

http://dbpedia.org/resource/George_Burns http://dbpedia.org/resource/Gracie_Allen 12

http://dbpedia.org/resource/Gracie_Allen http://dbpedia.org/resource/George_Burns 12

如何删除这些重复项？我向 ?actor 添加了性别，但它破坏了当前结果。

Answer 1

这当然不是真正的重复，因为您可以从两个方面查看它。如果需要，修复它的方法是添加过滤器。这有点肮脏，但它只包含 "same".

的 2 行

SELECT DISTINCT ?actor ?person2 ?cnt
WHERE
{
{
    select DISTINCT ?actor ?person2 (count (?film) as ?cnt) 
    where { 
        ?film    dbo:starring ?actor .
        ?actor dbo:spouse ?person2. 
        ?film    dbo:starring ?person2.
FILTER (?actor < ?person2)


    }
    order by ?actor
}
FILTER (?cnt >9)
}

Answer 2

显示了排除此类伪重复项的典型方法。结果实际上并不重复，因为在一个结果中，例如，George Burns 是 ?actor，而在另一个结果中，他是 ?person2。在许多情况下，您可以添加一个过滤器来要求对这两个事物进行排序，这将删除重复的案例。例如，当您有这样的数据时：

:a :likes :b .
:a :likes :c .

然后你搜索

select ?x ?y where { 
  :a :likes ?x, ?y .
}

您可以添加 filter(?x < ?y) 以强制执行 ?x 和 ?y 之间的排序，这将删除这些伪重复项。但是，在这种情况下，它有点棘手，因为找不到使用相同标准的 ?actor 和 ?person2。如果 DBpedia 包含

:PersonB dbo:spouse :PersonA

但不是

:PersonA dbo:spouse :PersonB

那么简单的过滤器将不起作用，因为您永远找不到主体 PersonA 小于客体 PersonB 的三元组。所以在这种情况下，您还需要稍微修改查询以使条件对称：

select distinct ?actor ?spouse (count(?film) as ?count) {
  ?film dbo:starring ?actor, ?spouse .
  ?actor dbo:spouse|^dbo:spouse ?spouse .
  filter(?actor < ?spouse)
}
group by ?actor ?spouse
having (count(?film) > 9)
order by ?actor

（这个查询也说明这里不需要子查询，可以在聚合值上使用having到"filter"。）但重要的是使用属性路径 dbo:spouse|^dbo:spouse 找到 ?spouse 的值使得 either ?演员dbo:spouse?配偶或?配偶dbo:spouse?演员。这使得关系对称，因此即使只在一个方向上声明关系，也可以保证获得所有对。

如何删除 sparql 查询中的重复项

how to remove duplicates in sparql query

sparql

dbpedia