Neo4j - 每组查询 N 项

Neo4j - querying N items per group

以下是我的查询:

MATCH (u:User{id:1})-[r:FOLLOWS]->(p:Publisher)<-[:PUBLISHED]-(i:Item)-[:TAGGED]->(t:Tag)<-[f:FOLLOWS]-u
RETURN a, count(t) ORDER BY count(k) DESC LIMIT 100

所以 User 可以跟在 PublisherTag 之后。查询通过计算匹配标签来查找该用户可能喜欢的项目。

假设在关系 u-r->p 上有两个属性,MINMAX。这些属性指定用户希望从每个发布者看到多少项目。我如何重写查询以允许这样做?

这是一个想法。例如说 FOLLOWS 关系有一个最小值和一个最大值集。您可以使用以下查询来根据这些值限制查询返回的数据。我也没有重写整个查询以包括标签和限制。

// find the user and the publisher and the relationship 
// which has the min/max parameters
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r

// macth the items that the publisher published
match p-[:PUBLISHED]-(i:Item)

// order them just because we can
with u, p, r, i
order by i.name

// collect the ordered items as the total list of items
with u, p, r, collect(i.name) as items

// make sure the collection is >= the minimum size of the list
// if so then return the items in the collection up to the max length 
// otherwise return and empty collection
// you might want to do something else
with u, p, r, case 
  when length(items) >= r.min then items[..r.max]
  else []
end as items
return u.name, p.name, r.min, r.max, items

不幸的是,您已经执行了获取项目的查询,只是为了显示目的而过滤掉它们。事先知道此人的偏好会很好,这样您就可以使用限制和参数在项目查询中应用最大限制。这将消除不必要的数据库命中。根据出版商的不同,可能会有很多很多项目,预先限制它们可能是有利的。

这里也有一些可供试验的变体。你也可以这样做...

// slight variation where the minimum is enforced with where instead of case
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
with u, p, r, collect(i.name) as items
where length(items) >= r.min
return u.name, p.name, items[..r.max]

甚至这个...

// only results actually between the min and max are returned
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
with u, p, r, collect(i.name) as items
where length(items) >= r.min
and length(items) <= r.max
return u.name, p.name, items[..r.max]