PostgreSQL。 Select 与聚合函数中的值相关的列
PostgreSQL. Select a column that correlates with value in the aggregate function
这里是'items'table,包含10多行:
+-----+-----------+-----------+----------+
| id | item_name | category | quantity |
+=====+===========+===========+==========+
| 3 | item33 | category1 | 5 |
+-----+-----------+-----------+----------+
| 2 | item52 | category5 | 1 |
+-----+-----------+-----------+----------+
| 1 | item46 | category1 | 3 |
+-----+-----------+-----------+----------+
| 4 | item11 | category3 | 2 |
+-----+-----------+-----------+----------+
| ... | ... | ... | ... |
+-----+-----------+-----------+----------+
'items' 列中的值是唯一的,'category' 列中的值不是唯一的。
任务是:
- 删除类别的重复项:如果一个类别包含超过 1 个项目,则取具有最小 'id' 的行。
- 按 'quantity' (ASC) 排序结果。
- 从其余结果数据输出中取出 10 行:前 5 行和随机 5 行。
因此,排序 table(在 #2 子任务之后)应该如下所示:
+-----+-----------+-----------+----------+
| id | item_name | category | quantity |
+=====+===========+===========+==========+
| 2 | item52 | category5 | 1 |
+-----+-----------+-----------+----------+
| 4 | item11 | category3 | 2 |
+-----+-----------+-----------+----------+
| 1 | item46 | category1 | 3 |
+-----+-----------+-----------+----------+
| ... | ... | ... | ... |
+-----+-----------+-----------+----------+
我知道如何排除类别的重复项:
SELECT min(id) as id, category
FROM items
GROUP BY category
但我不知道如何按数量订购。
如果我尝试将 'quantity' 添加到 'select' 行然后生成 'ORDER BY quantity',我会收到错误消息:"column "quantity" must appear in the GROUP BY clause or在聚合函数中使用.
如果有办法将此 'quantity' 列添加到数据输出(此列中的值应与生成的 'id' 值相关联(即“min(id)”)) ?然后进行排序和挑选行...
您需要使用解析函数如下:
Select * from
(Select t.*,
Row_number() over (order by quantity) as rn_q
from
(Select t.*,
Row_number() over (partition by category order by id) as rn
From your_table) t
Where rn = 1) t
Order by case when rn_q <= 5 then quantity else 6 end;
考虑将聚合查询连接回所有列的单元级数据,包括 quantity
:
SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN
(SELECT category, min(id) AS min_id
FROM items
GROUP BY category) agg
ON i.id = agg.min_id
AND i.category = agg.category
ORDER BY i.quantity
对于前 5 名和随机 5 名拆分,与 CTE 集成一个联合来保存结果集:
WITH sub AS (
SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN
(SELECT category, min(id) AS min_id
FROM items
GROUP BY category) agg
ON i.id = agg.min_id
AND i.category = agg.category
)
-- TOP 5 ROWS
SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
LIMIT 5
UNION
-- RANDOM ROWS OF NON-TOP 5
SELECT id, item_name, category, quantity
FROM
(SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
OFFSET 5) below5
ORDER BY random()
LIMIT 5
基本上,DISTINCT ON
在 Postgres 中运行良好。参见:
- Select first row in each GROUP BY group?
- PostgreSQL DISTINCT ON with different ORDER BY
简单(正确!)解决方案:
WITH dist_cat AS (
SELECT t, row_number() OVER (ORDER BY quantity, id) AS rn -- added id as tiebreaker
FROM (
SELECT DISTINCT ON (category) *
FROM tbl
ORDER BY category, id
) t -- distinct categories
ORDER BY ORDER BY quantity, id -- match sort for row_number()
)
SELECT (t).*
FROM dist_cat
WHERE rn <= 5
UNION ALL -- not just UNION
( -- parentheses required
SELECT (t).*
FROM dist_cat
WHERE rn > 5
ORDER BY random()
LIMIT 5
);
添加 id
作为排序的决定因素,因为按 quantity
排序很难确定。在此处放置适合您要求的任何独特表达式。如果您对每次调用可能会改变的任意结果都满意,则可以跳过它。
行类型t
是为了方便,所以我们不用把所有的列名都拼出来,结果中还是去掉了追加的rn
,还没有请求。
我选择在 CTE 中对行进行排序并附加行编号 rn
以避免额外的排序操作。
另外 5 个随机行 真正 随机选择,而不是任意选择。
使用 UNION ALL
,而不仅仅是 UNION
。因为它 正确 我们正在做的事情,而且也更便宜。而且还要保留 CTE 的排序顺序; UNION
尝试删除重复项可能会搞砸 - 徒劳。
对于大表,根据数据分布,可能有(多)更快的技术...
... 用于获取唯一类别:
- Optimize GROUP BY query to retrieve latest row per user
.. 获取随机行:
- Best way to select random rows PostgreSQL
这里是'items'table,包含10多行:
+-----+-----------+-----------+----------+
| id | item_name | category | quantity |
+=====+===========+===========+==========+
| 3 | item33 | category1 | 5 |
+-----+-----------+-----------+----------+
| 2 | item52 | category5 | 1 |
+-----+-----------+-----------+----------+
| 1 | item46 | category1 | 3 |
+-----+-----------+-----------+----------+
| 4 | item11 | category3 | 2 |
+-----+-----------+-----------+----------+
| ... | ... | ... | ... |
+-----+-----------+-----------+----------+
'items' 列中的值是唯一的,'category' 列中的值不是唯一的。
任务是:
- 删除类别的重复项:如果一个类别包含超过 1 个项目,则取具有最小 'id' 的行。
- 按 'quantity' (ASC) 排序结果。
- 从其余结果数据输出中取出 10 行:前 5 行和随机 5 行。
因此,排序 table(在 #2 子任务之后)应该如下所示:
+-----+-----------+-----------+----------+
| id | item_name | category | quantity |
+=====+===========+===========+==========+
| 2 | item52 | category5 | 1 |
+-----+-----------+-----------+----------+
| 4 | item11 | category3 | 2 |
+-----+-----------+-----------+----------+
| 1 | item46 | category1 | 3 |
+-----+-----------+-----------+----------+
| ... | ... | ... | ... |
+-----+-----------+-----------+----------+
我知道如何排除类别的重复项:
SELECT min(id) as id, category
FROM items
GROUP BY category
但我不知道如何按数量订购。 如果我尝试将 'quantity' 添加到 'select' 行然后生成 'ORDER BY quantity',我会收到错误消息:"column "quantity" must appear in the GROUP BY clause or在聚合函数中使用.
如果有办法将此 'quantity' 列添加到数据输出(此列中的值应与生成的 'id' 值相关联(即“min(id)”)) ?然后进行排序和挑选行...
您需要使用解析函数如下:
Select * from
(Select t.*,
Row_number() over (order by quantity) as rn_q
from
(Select t.*,
Row_number() over (partition by category order by id) as rn
From your_table) t
Where rn = 1) t
Order by case when rn_q <= 5 then quantity else 6 end;
考虑将聚合查询连接回所有列的单元级数据,包括 quantity
:
SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN
(SELECT category, min(id) AS min_id
FROM items
GROUP BY category) agg
ON i.id = agg.min_id
AND i.category = agg.category
ORDER BY i.quantity
对于前 5 名和随机 5 名拆分,与 CTE 集成一个联合来保存结果集:
WITH sub AS (
SELECT i.id, i.item_name, i.category, i.quantity
FROM items i
INNER JOIN
(SELECT category, min(id) AS min_id
FROM items
GROUP BY category) agg
ON i.id = agg.min_id
AND i.category = agg.category
)
-- TOP 5 ROWS
SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
LIMIT 5
UNION
-- RANDOM ROWS OF NON-TOP 5
SELECT id, item_name, category, quantity
FROM
(SELECT id, item_name, category, quantity
FROM sub
ORDER BY i.quantity
OFFSET 5) below5
ORDER BY random()
LIMIT 5
基本上,DISTINCT ON
在 Postgres 中运行良好。参见:
- Select first row in each GROUP BY group?
- PostgreSQL DISTINCT ON with different ORDER BY
简单(正确!)解决方案:
WITH dist_cat AS (
SELECT t, row_number() OVER (ORDER BY quantity, id) AS rn -- added id as tiebreaker
FROM (
SELECT DISTINCT ON (category) *
FROM tbl
ORDER BY category, id
) t -- distinct categories
ORDER BY ORDER BY quantity, id -- match sort for row_number()
)
SELECT (t).*
FROM dist_cat
WHERE rn <= 5
UNION ALL -- not just UNION
( -- parentheses required
SELECT (t).*
FROM dist_cat
WHERE rn > 5
ORDER BY random()
LIMIT 5
);
添加 id
作为排序的决定因素,因为按 quantity
排序很难确定。在此处放置适合您要求的任何独特表达式。如果您对每次调用可能会改变的任意结果都满意,则可以跳过它。
行类型t
是为了方便,所以我们不用把所有的列名都拼出来,结果中还是去掉了追加的rn
,还没有请求。
我选择在 CTE 中对行进行排序并附加行编号 rn
以避免额外的排序操作。
另外 5 个随机行 真正 随机选择,而不是任意选择。
使用 UNION ALL
,而不仅仅是 UNION
。因为它 正确 我们正在做的事情,而且也更便宜。而且还要保留 CTE 的排序顺序; UNION
尝试删除重复项可能会搞砸 - 徒劳。
对于大表,根据数据分布,可能有(多)更快的技术...
... 用于获取唯一类别:
- Optimize GROUP BY query to retrieve latest row per user
.. 获取随机行:
- Best way to select random rows PostgreSQL