计算与 DISTINCT ON 中使用的列不同的唯一行 GROUP(ed)

Question

我敢肯定这个问题已经被问过一遍又一遍，但我找不到一个我可以完全理解的简单示例。

我正在尝试对一列进行重复数据删除（执行 DISTINCT ON），并 COUNT 记录与用于重复数据删除的列不同的 GROUPed By 列，但不引入子查询。

假设我有一个包含以下信息的 table：

order_num	date	region	timestamp_updated
001	2021-09-01	Murica	2021-09-02T19:00:01Z
001	2021-09-01	Murica	2021-09-03T19:00:01Z
002	2021-09-01	Yurop	2021-09-02T19:00:01Z
003	2021-09-01	Yurop	2021-09-03T19:00:01Z
004	2021-09-02	Yurop	2021-09-03T19:00:01Z

我想首先获得具有不同 order_num（保持最近更新）AND 的唯一记录，然后按 date 对组或订单进行计数和 region.

去重（去掉最旧的order_num='001A'）：

order_num	date	region	timestamp_updated
001	2021-09-01	Murica	2021-09-03T19:00:01Z
002	2021-09-01	Yurop	2021-09-02T19:00:01Z
003	2021-09-01	Yurop	2021-09-03T19:00:01Z
004	2021-09-02	Yurop	2021-09-03T19:00:01Z

然后分组统计：

date region count

2021-09-01 Murica 1

2021-09-01 Yurop 2

2021-09-02 Yurop 1

我知道如何分别执行这两件事 (distinct on(order_num) + order by timestamp_updated desc) 去重，然后 select count(*) + group by date, region ) 甚至与子查询一起执行。但我想尽量避免子查询，这里是 window 函数（似乎）派上用场的地方，我不知道 ~~much~~ 任何关于那些.

我能得到的最接近的是组，但它们每个 order_num 显示一条记录。记录正确，但重复：

select distinct on (order_num) date, region, count(1)over (
    partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;

该查询 ^^ 显示：

date	region	count
2021-09-01	Murica	1	I think this is the first 001
2021-09-01	Murica	1	I think this is the second 001
2021-09-01	Yurop	2	I think this is the first Yurop: 002
2021-09-01	Yurop	2	I think this is the second Yurop: 003
2021-09-02	Yurop	1

Answer 1

您可以获得每个 order_num, date, region 的最大值 timestamp_updated，然后使用 window function

再次聚合以获得每个 date, region 的计数

select distinct 
       date, 
       region, 
       count(max(timestamp_updated)) over (partition by date, region) as counts 
from t
group by order_num, date, region;

DEMO

计算与 DISTINCT ON 中使用的列不同的唯一行 GROUP(ed)

Count unique rows GROUP(ed) BY different columns than used in DISTINCT ON

sql

postgresql

group-by

count

distinct-on