Rails - 加入后区别开

Rails - Distinct ON after a join

我正在使用 Rails 4.2 和 PostgreSQL。我有一个 Product 模型和一个 Purchase 模型 Product has many Purchases。我想找到不同的最近购买的产品。最初我试过:

Product.joins(:purchases)
.select("DISTINCT products.*, purchases.updated_at") #postgresql requires order column in select
.order("purchases.updated_at DESC")

然而,这会导致重复,因为它会尝试查找所有元组(product.idpurchases.updated_at)具有唯一值的元组。但是我只想在加入后 select 具有不同 id 的产品。如果产品 ID 在连接中出现多次,则只有 select 第一个。所以我也尝试了:

Product.joins(:purchases)
.select("DISTINCT ON (product.id) purchases.updated_at, products.*")
.order("product.id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause

这不起作用,因为我需要在 order 子句中指定 product.id,因为 this 约束会输出意外的顺序。

实现此目的的rails方法是什么?

使用子查询并在外部 SELECT:

中添加不同的 ORDER BY 子句
SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id  -- guessing
   ORDER  BY pr.id, pu.updated_at DESC NULLS LAST
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

DISTINCT ON 的详细信息:

  • Select first row in each GROUP BY group?

或其他一些查询技术:

  • Optimize GROUP BY query to retrieve latest record per user

但是如果您只需要 Purchases updated_at,您可以在加入之前通过子查询中的简单聚合获得更便宜的价格:

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id  -- guessing
ORDER  BY pu.updated_at DESC NULLS LAST;

关于NULLS LAST

  • PostgreSQL sort by datetime asc, null first?

或者更简单,但在检索所有行时没有那么快:

SELECT pr.*, max(updated_at) AS updated_at
FROM   Product pr
JOIN   Purchases pu ON pu.product_id = pr.id
GROUP  BY pr.id  -- must be primary key
ORDER  BY 2 DESC NULLS LAST;

Product.id 需要定义为主键才能工作。详情:

  • PostgreSQL - GROUP BY clause
  • Return a grouped list with occurrences using Rails and PostgreSQL

如果您只获取一小部分选择(例如,使用 WHERE 子句限制为一个或几个 pr.id),这会更快。

尝试这样做:

Product.joins(:purchases)
.select("DISTINCT ON (products_id) purchases.product_id, purchases.updated_at, products.*")
.order("product_id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause

我最终得到了这个 -

Product.joins(:purchases)
.select("DISTINCT ON (products.id) products.*, purchases.updated_at as date")
.sort_by(&:date)
.reverse

仍在寻找更好的方法。

erwin-brandstetter 的答案为基础,这是您可以使用 ActiveRecord 执行此操作的方法(至少应该接近):

Product
  .select('*')
  .joins('INNER JOIN (SELECT product_id, max(updated_at) AS updated_at FROM Purchases GROUP  BY 1) pu ON pu.product_id = pr.id')
  .order('pu.updated_at DESC NULLS LAST')

基于@ErwinBrandstetter 的回答,我终于找到了正确的方法。查找不同的最近购买的查询是

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

子查询中不需要 order_by,因为无论如何我们都在外部查询中进行排序。

rails 这样做的方法是 -

inner_query = Product.joins(:purchases)
  .select("DISTINCT ON (products.id) products.*, purchases.updated_at as date") #This selects all the unique purchased products.

result = Product.from("(#{inner_query.to_sql}) as unique_purchases")
  .select("unique_purchases.*").order("unique_purchases.date DESC")

@ErwinBrandstetter 建议的第二种(也是更好的)方法是

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id
ORDER  BY pu.updated_at DESC NULLS LAST;

在rails中可以写成

join_query = Purchase.select("product_id, max(updated_at) as date")
  .group(1) #This selects most recent date for all purchased products

result = Product.joins("INNER JOIN (#{join_query.to_sql}) as unique_purchases ON products.id = unique_purchases.product_id")
  .order("unique_purchases.date")