PostgreSQL。 Select 与聚合函数中的值相关的列

PostgreSQL. Select a column that correlates with value in the aggregate function

这里是'items'table,包含10多行:

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 3   | item33    | category1 | 5        |
+-----+-----------+-----------+----------+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

'items' 列中的值是唯一的,'category' 列中的值不是唯一的。

任务是:

  1. 删除类别的重复项:如果一个类别包含超过 1 个项目,则取具有最小 'id' 的行。
  2. 按 'quantity' (ASC) 排序结果。
  3. 从其余结果数据输出中取出 10 行:前 5 行和随机 5 行。

因此,排序 table(在 #2 子任务之后)应该如下所示:

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

我知道如何排除类别的重复项:

SELECT min(id) as id, category
FROM items
GROUP BY category

但我不知道如何按数量订购。 如果我尝试将 'quantity' 添加到 'select' 行然后生成 'ORDER BY quantity',我会收到错误消息:"column "quantity" must appear in the GROUP BY clause or在聚合函数中使用.

如果有办法将此 'quantity' 列添加到数据输出(此列中的值应与生成的 'id' 值相关联(即“min(id)”)) ?然后进行排序和挑选行...

您需要使用解析函数如下:

Select * from
(Select t.*,
       Row_number() over (order by quantity) as rn_q
 from
(Select t.*,
       Row_number() over (partition by category order by id) as rn
  From your_table) t
Where rn = 1) t
Order by case when rn_q <= 5 then quantity else 6 end;

考虑将聚合查询连接回所有列的单元级数据,包括 quantity:

SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN 
  (SELECT category, min(id) AS min_id
   FROM items
   GROUP BY category) agg
 ON i.id = agg.min_id
 AND i.category = agg.category
ORDER BY i.quantity

对于前 5 名和随机 5 名拆分,与 CTE 集成一个联合来保存结果集:

WITH sub AS (
  SELECT i.id, i.item_name, i.category, i.quantity
  FROM items i
  INNER JOIN 
    (SELECT category, min(id) AS min_id
     FROM items
     GROUP BY category) agg
   ON i.id = agg.min_id
   AND i.category = agg.category
)

-- TOP 5 ROWS
SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
LIMIT 5

UNION

-- RANDOM ROWS OF NON-TOP 5
SELECT id, item_name, category, quantity
FROM 
  (SELECT id, item_name, category, quantity
   FROM sub
   ORDER BY i.quantity
   OFFSET 5) below5
ORDER BY random()
LIMIT 5

基本上,DISTINCT ON 在 Postgres 中运行良好。参见:

  • Select first row in each GROUP BY group?
  • PostgreSQL DISTINCT ON with different ORDER BY

简单(正确!)解决方案:

WITH dist_cat AS (
   SELECT t, row_number() OVER (ORDER BY quantity, id) AS rn   -- added id as tiebreaker
   FROM  (
      SELECT DISTINCT ON (category) *
      FROM   tbl
      ORDER  BY category, id
      ) t  -- distinct categories
   ORDER  BY ORDER BY quantity, id  -- match sort for row_number()
   )
SELECT (t).*
FROM   dist_cat
WHERE  rn <= 5

UNION ALL   -- not just UNION
(  -- parentheses required
SELECT (t).*
FROM   dist_cat
WHERE  rn > 5
ORDER  BY random()
LIMIT  5
);

添加 id 作为排序的决定因素,因为按 quantity 排序很难确定。在此处放置适合您要求的任何独特表达式。如果您对每次调用可能会改变的任意结果都满意,则可以跳过它。

行类型t是为了方便,所以我们不用把所有的列名都拼出来,结果中还是去掉了追加的rn,还没有请求。

我选择在 CTE 中对行进行排序并附加行编号 rn 以避免额外的排序操作。

另外 5 个随机行 真正 随机选择,而不是任意选择。

使用 UNION ALL,而不仅仅是 UNION。因为它 正确 我们正在做的事情,而且也更便宜。而且还要保留 CTE 的排序顺序; UNION 尝试删除重复项可能会搞砸 - 徒劳。

对于大表,根据数据分布,可能有(多)更快的技术...

... 用于获取唯一类别:

  • Optimize GROUP BY query to retrieve latest row per user

.. 获取随机行:

  • Best way to select random rows PostgreSQL