sparql 取 r 的前 n 但尊重聚合的值
sparql take the first n of r but respect the value of aggregation
我昨天问了一个关于如何取前n个值作为变量值的问题。当我有一个聚合时,昨天给出的解决方案在所有情况下都适用,因为这是一个新场景,我不仅想要第一个 n 项,而且我还希望它们根据聚合值排序。
上下文
我有一个用户过去喜欢过很多项目,我想获取该用户喜欢的作者,然后推荐具有相同作者的其他项目。
数据
这是我为这个问题创建的最少数据。它太小了,我不能把它变小,因为我想让你看到聚合,这就是我要问的部分
@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:algorithm1 :hasArtist :artist_anita .
:book4 :hasArtist :artist_anita .
:book3 :hasArtist :artist_anita .
:animalFarm :hasArtist :george_orwell .
:book5 :hasArtist :artist_anita .
:book6 :hasArtist :george_orwell.
:algorithm1 a :RecommendableClass .
:book4 a :RecommendableClass .
:book3 a :RecommendableClass .
:animalFarm a :RecommendableClass .
:book5 a :RecommendableClass .
:book6 a :RecommendableClass .
:ania :likes :book5 , :book6 .
:user1 :hasRated [ :ratesBy "0.8"^^xsd:float ; :aboutItem :algorithm1] .
:user2 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :algorithm1 ] .
:user1 :hasRated [:ratesBy "0.5"^^xsd:float ; :aboutItem :book3] .
:user3 :hasRated [:ratesBy "0.6"^^xsd:float ; :aboutItem :book3] .
:user2 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :book4] .
:user4 :hasRated [:ratesBy "0.3"^^xsd:float ; :aboutItem :book4] .
:user3 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :animalFarm] .
:user5 :hasRated [:ratesBy "0.1"^^xsd:float ; :aboutItem :animalFarm] .
查询
PREFIX : <http://example.org/rs#>
select ?item (AVG(?ratingValue) as ?averageRatingValue) ?value
where
{
{ VALUES ?user { :ania }
?anotherUser :hasRated [:aboutItem ?item ; :ratesBy ?ratingValue ] .
{
SELECT ?item ?value ?countableProperty
WHERE
{
VALUES ?user { :ania }
?item a :RecommendableClass ; :hasArtist ?value .
{
SELECT ?value (count(*) AS ?count)
WHERE
{
VALUES ?user { :ania }
?user :likes [:hasArtist ?value] .
}
GROUP BY ?value
ORDER BY DESC(?count)
LIMIT 10
}
FILTER NOT EXISTS {?user :hasRated ?rating .
?rating :aboutItem ?item
}
}
}
filter (?anotherUser != ?user)
}
}
group by ?item ?value
having (?averageRatingValue > 0.2)
order by ?value desc(?averageRatingValue)
结果
如您所见,Artist_anita
有三个结果,geroge_orwell
只有一个。但我只想为他们每个人准备一件物品。 肯定尊重平均值
你能帮忙吗?
首先,一些更简单的示例数据:
@prefix : <urn:ex:>
:artist1 :p
[ :item :item1 ; :rating 0.3, 0.4 ] ,
[ :item :item2 ; :rating 0.8, 0.7, 0.9 ] ,
[ :item :item3 ; :rating 0.9 ] .
:artist2 :p
[ :item :item2 ; :rating 0.4, 0.45 ] ,
[ :item :item3 ; :rating 0.1, 0.2 ] ,
[ :item :item4 ; :rating 0.7 ] .
这是一个查找项目和每位艺术家的平均评分的查询:
prefix : <urn:ex:>
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
---------------------------------
| item | avgRating | artist |
=================================
| :item3 | 0.9 | :artist1 |
| :item2 | 0.8 | :artist1 |
| :item1 | 0.35 | :artist1 |
| :item4 | 0.7 | :artist2 |
| :item2 | 0.425 | :artist2 |
| :item3 | 0.15 | :artist2 |
---------------------------------
现在,要通过 avgRating 获得前 n 项,您需要另一个子查询来找出每个艺术家有多少项具有较低的平均排名:
prefix : <urn:ex:>
select ?item (count(?item2) as ?rank) ?avgRating ?artist {
{
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
}
{
select ?item2 (avg(?rating) as ?avgRating2) ?artist {
?artist :p [ :item ?item2 ; :rating ?rating ] .
}
group by ?artist ?item2
}
filter (?avgRating <= ?avgRating2)
}
group by ?item ?artist ?avgRating
----------------------------------------
| item | rank | avgRating | artist |
========================================
| :item3 | 1 | 0.9 | :artist1 |
| :item2 | 2 | 0.8 | :artist1 |
| :item1 | 3 | 0.35 | :artist1 |
| :item4 | 1 | 0.7 | :artist2 |
| :item2 | 2 | 0.425 | :artist2 |
| :item3 | 3 | 0.15 | :artist2 |
----------------------------------------
现在您可以对其进行过滤,以确保您只获得排名低于某个值的值:
prefix : <urn:ex:>
select ?item ?avgRating ?artist {
{
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
}
{
select ?item2 (avg(?rating) as ?avgRating2) ?artist {
?artist :p [ :item ?item2 ; :rating ?rating ] .
}
group by ?artist ?item2
}
filter (?avgRating <= ?avgRating2)
}
group by ?item ?artist ?avgRating
having (count(?item2) <= 2)
---------------------------------
| item | avgRating | artist |
=================================
| :item3 | 0.9 | :artist1 |
| :item2 | 0.8 | :artist1 |
| :item4 | 0.7 | :artist2 |
| :item2 | 0.425 | :artist2 |
---------------------------------
在应用程序层执行此操作可能会更好,但在 SPARQL 中当然可以。原则上,这个查询找到每个项目的平均评分两次(在第一个子查询中一次,然后在第二个子查询中再次),但是一个好的查询优化器会计算出这两个子查询是相同的,除了变量命名不同,并且可以只计算一次。
我昨天问了一个关于如何取前n个值作为变量值的问题。当我有一个聚合时,昨天给出的解决方案在所有情况下都适用,因为这是一个新场景,我不仅想要第一个 n 项,而且我还希望它们根据聚合值排序。
上下文
我有一个用户过去喜欢过很多项目,我想获取该用户喜欢的作者,然后推荐具有相同作者的其他项目。
数据
这是我为这个问题创建的最少数据。它太小了,我不能把它变小,因为我想让你看到聚合,这就是我要问的部分
@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:algorithm1 :hasArtist :artist_anita .
:book4 :hasArtist :artist_anita .
:book3 :hasArtist :artist_anita .
:animalFarm :hasArtist :george_orwell .
:book5 :hasArtist :artist_anita .
:book6 :hasArtist :george_orwell.
:algorithm1 a :RecommendableClass .
:book4 a :RecommendableClass .
:book3 a :RecommendableClass .
:animalFarm a :RecommendableClass .
:book5 a :RecommendableClass .
:book6 a :RecommendableClass .
:ania :likes :book5 , :book6 .
:user1 :hasRated [ :ratesBy "0.8"^^xsd:float ; :aboutItem :algorithm1] .
:user2 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :algorithm1 ] .
:user1 :hasRated [:ratesBy "0.5"^^xsd:float ; :aboutItem :book3] .
:user3 :hasRated [:ratesBy "0.6"^^xsd:float ; :aboutItem :book3] .
:user2 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :book4] .
:user4 :hasRated [:ratesBy "0.3"^^xsd:float ; :aboutItem :book4] .
:user3 :hasRated [:ratesBy "0.9"^^xsd:float ; :aboutItem :animalFarm] .
:user5 :hasRated [:ratesBy "0.1"^^xsd:float ; :aboutItem :animalFarm] .
查询
PREFIX : <http://example.org/rs#>
select ?item (AVG(?ratingValue) as ?averageRatingValue) ?value
where
{
{ VALUES ?user { :ania }
?anotherUser :hasRated [:aboutItem ?item ; :ratesBy ?ratingValue ] .
{
SELECT ?item ?value ?countableProperty
WHERE
{
VALUES ?user { :ania }
?item a :RecommendableClass ; :hasArtist ?value .
{
SELECT ?value (count(*) AS ?count)
WHERE
{
VALUES ?user { :ania }
?user :likes [:hasArtist ?value] .
}
GROUP BY ?value
ORDER BY DESC(?count)
LIMIT 10
}
FILTER NOT EXISTS {?user :hasRated ?rating .
?rating :aboutItem ?item
}
}
}
filter (?anotherUser != ?user)
}
}
group by ?item ?value
having (?averageRatingValue > 0.2)
order by ?value desc(?averageRatingValue)
结果
如您所见,Artist_anita
有三个结果,geroge_orwell
只有一个。但我只想为他们每个人准备一件物品。 肯定尊重平均值
你能帮忙吗?
首先,一些更简单的示例数据:
@prefix : <urn:ex:>
:artist1 :p
[ :item :item1 ; :rating 0.3, 0.4 ] ,
[ :item :item2 ; :rating 0.8, 0.7, 0.9 ] ,
[ :item :item3 ; :rating 0.9 ] .
:artist2 :p
[ :item :item2 ; :rating 0.4, 0.45 ] ,
[ :item :item3 ; :rating 0.1, 0.2 ] ,
[ :item :item4 ; :rating 0.7 ] .
这是一个查找项目和每位艺术家的平均评分的查询:
prefix : <urn:ex:>
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
---------------------------------
| item | avgRating | artist |
=================================
| :item3 | 0.9 | :artist1 |
| :item2 | 0.8 | :artist1 |
| :item1 | 0.35 | :artist1 |
| :item4 | 0.7 | :artist2 |
| :item2 | 0.425 | :artist2 |
| :item3 | 0.15 | :artist2 |
---------------------------------
现在,要通过 avgRating 获得前 n 项,您需要另一个子查询来找出每个艺术家有多少项具有较低的平均排名:
prefix : <urn:ex:>
select ?item (count(?item2) as ?rank) ?avgRating ?artist {
{
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
}
{
select ?item2 (avg(?rating) as ?avgRating2) ?artist {
?artist :p [ :item ?item2 ; :rating ?rating ] .
}
group by ?artist ?item2
}
filter (?avgRating <= ?avgRating2)
}
group by ?item ?artist ?avgRating
----------------------------------------
| item | rank | avgRating | artist |
========================================
| :item3 | 1 | 0.9 | :artist1 |
| :item2 | 2 | 0.8 | :artist1 |
| :item1 | 3 | 0.35 | :artist1 |
| :item4 | 1 | 0.7 | :artist2 |
| :item2 | 2 | 0.425 | :artist2 |
| :item3 | 3 | 0.15 | :artist2 |
----------------------------------------
现在您可以对其进行过滤,以确保您只获得排名低于某个值的值:
prefix : <urn:ex:>
select ?item ?avgRating ?artist {
{
select ?item (avg(?rating) as ?avgRating) ?artist {
?artist :p [ :item ?item ; :rating ?rating ] .
}
group by ?artist ?item
}
{
select ?item2 (avg(?rating) as ?avgRating2) ?artist {
?artist :p [ :item ?item2 ; :rating ?rating ] .
}
group by ?artist ?item2
}
filter (?avgRating <= ?avgRating2)
}
group by ?item ?artist ?avgRating
having (count(?item2) <= 2)
---------------------------------
| item | avgRating | artist |
=================================
| :item3 | 0.9 | :artist1 |
| :item2 | 0.8 | :artist1 |
| :item4 | 0.7 | :artist2 |
| :item2 | 0.425 | :artist2 |
---------------------------------
在应用程序层执行此操作可能会更好,但在 SPARQL 中当然可以。原则上,这个查询找到每个项目的平均评分两次(在第一个子查询中一次,然后在第二个子查询中再次),但是一个好的查询优化器会计算出这两个子查询是相同的,除了变量命名不同,并且可以只计算一次。