PostgreSQL。 Select 与聚合函数中的值相关的列

Question

这里是'items'table，包含10多行：

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 3   | item33    | category1 | 5        |
+-----+-----------+-----------+----------+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

'items' 列中的值是唯一的，'category' 列中的值不是唯一的。

任务是：

删除类别的重复项：如果一个类别包含超过 1 个项目，则取具有最小 'id' 的行。
按 'quantity' (ASC) 排序结果。
从其余结果数据输出中取出 10 行：前 5 行和随机 5 行。

因此，排序 table（在 #2 子任务之后）应该如下所示：

+-----+-----------+-----------+----------+
| id  | item_name | category  | quantity |
+=====+===========+===========+==========+
| 2   | item52    | category5 | 1        |
+-----+-----------+-----------+----------+
| 4   | item11    | category3 | 2        |
+-----+-----------+-----------+----------+
| 1   | item46    | category1 | 3        |
+-----+-----------+-----------+----------+
| ... | ...       | ...       | ...      |
+-----+-----------+-----------+----------+

我知道如何排除类别的重复项：

SELECT min(id) as id, category
FROM items
GROUP BY category

但我不知道如何按数量订购。如果我尝试将 'quantity' 添加到 'select' 行然后生成 'ORDER BY quantity'，我会收到错误消息："column "quantity" must appear in the GROUP BY clause or在聚合函数中使用.

如果有办法将此 'quantity' 列添加到数据输出（此列中的值应与生成的 'id' 值相关联（即“min(id)”））？然后进行排序和挑选行...

Answer 1

您需要使用解析函数如下：

Select * from
(Select t.*,
       Row_number() over (order by quantity) as rn_q
 from
(Select t.*,
       Row_number() over (partition by category order by id) as rn
  From your_table) t
Where rn = 1) t
Order by case when rn_q <= 5 then quantity else 6 end;

Answer 2

考虑将聚合查询连接回所有列的单元级数据，包括 quantity:

SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN 
  (SELECT category, min(id) AS min_id
   FROM items
   GROUP BY category) agg
 ON i.id = agg.min_id
 AND i.category = agg.category
ORDER BY i.quantity

对于前 5 名和随机 5 名拆分，与 CTE 集成一个联合来保存结果集：

WITH sub AS (
  SELECT i.id, i.item_name, i.category, i.quantity
  FROM items i
  INNER JOIN 
    (SELECT category, min(id) AS min_id
     FROM items
     GROUP BY category) agg
   ON i.id = agg.min_id
   AND i.category = agg.category
)

-- TOP 5 ROWS
SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
LIMIT 5

UNION

-- RANDOM ROWS OF NON-TOP 5
SELECT id, item_name, category, quantity
FROM 
  (SELECT id, item_name, category, quantity
   FROM sub
   ORDER BY i.quantity
   OFFSET 5) below5
ORDER BY random()
LIMIT 5

Answer 3

基本上，DISTINCT ON 在 Postgres 中运行良好。参见：

Select first row in each GROUP BY group?
PostgreSQL DISTINCT ON with different ORDER BY

简单（正确！）解决方案：

WITH dist_cat AS (
   SELECT t, row_number() OVER (ORDER BY quantity, id) AS rn   -- added id as tiebreaker
   FROM  (
      SELECT DISTINCT ON (category) *
      FROM   tbl
      ORDER  BY category, id
      ) t  -- distinct categories
   ORDER  BY ORDER BY quantity, id  -- match sort for row_number()
   )
SELECT (t).*
FROM   dist_cat
WHERE  rn <= 5

UNION ALL   -- not just UNION
(  -- parentheses required
SELECT (t).*
FROM   dist_cat
WHERE  rn > 5
ORDER  BY random()
LIMIT  5
);

添加 id 作为排序的决定因素，因为按 quantity 排序很难确定。在此处放置适合您要求的任何独特表达式。如果您对每次调用可能会改变的任意结果都满意，则可以跳过它。

行类型t是为了方便，所以我们不用把所有的列名都拼出来，结果中还是去掉了追加的rn，还没有请求。

我选择在 CTE 中对行进行排序并附加行编号 rn 以避免额外的排序操作。

另外 5 个随机行真正随机选择，而不是任意选择。

使用 UNION ALL，而不仅仅是 UNION。因为它正确我们正在做的事情，而且也更便宜。而且还要保留 CTE 的排序顺序； UNION 尝试删除重复项可能会搞砸 - 徒劳。

对于大表，根据数据分布，可能有（多）更快的技术...

... 用于获取唯一类别：

Optimize GROUP BY query to retrieve latest row per user

.. 获取随机行：

Best way to select random rows PostgreSQL

PostgreSQL。 Select 与聚合函数中的值相关的列

PostgreSQL. Select a column that correlates with value in the aggregate function

sql

random

postgresql

greatest-n-per-group