计算与 DISTINCT ON 中使用的列不同的唯一行 GROUP(ed)
Count unique rows GROUP(ed) BY different columns than used in DISTINCT ON
我敢肯定这个问题已经被问过一遍又一遍,但我找不到一个我可以完全理解的简单示例。
我正在尝试对一列进行重复数据删除(执行 DISTINCT ON
),并 COUNT
记录与用于重复数据删除的列不同的 GROUPed By
列,但不引入子查询。
假设我有一个包含以下信息的 table:
order_num
date
region
timestamp_updated
001
2021-09-01
Murica
2021-09-02T19:00:01Z
001
2021-09-01
Murica
2021-09-03T19:00:01Z
002
2021-09-01
Yurop
2021-09-02T19:00:01Z
003
2021-09-01
Yurop
2021-09-03T19:00:01Z
004
2021-09-02
Yurop
2021-09-03T19:00:01Z
我想首先获得具有不同 order_num
(保持最近更新)AND 的唯一记录,然后按 date
对组或订单进行计数和 region
.
去重(去掉最旧的order_num='001A'
):
order_num
date
region
timestamp_updated
001
2021-09-01
Murica
2021-09-03T19:00:01Z
002
2021-09-01
Yurop
2021-09-02T19:00:01Z
003
2021-09-01
Yurop
2021-09-03T19:00:01Z
004
2021-09-02
Yurop
2021-09-03T19:00:01Z
然后分组统计:
date
region
count
2021-09-01
Murica
1
2021-09-01
Yurop
2
2021-09-02
Yurop
1
我知道如何分别执行这两件事 (distinct on(order_num)
+ order by timestamp_updated desc
) 去重,然后 select count(*)
+ group by date, region
) 甚至与子查询一起执行。但我想尽量避免子查询,这里是 window 函数(似乎)派上用场的地方,我不知道 much 任何关于那些.
我能得到的最接近的是组,但它们每个 order_num
显示一条记录。记录正确,但重复:
select distinct on (order_num) date, region, count(1)over (
partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;
该查询 ^^ 显示:
date
region
count
2021-09-01
Murica
1
I think this is the first 001
2021-09-01
Murica
1
I think this is the second 001
2021-09-01
Yurop
2
I think this is the first Yurop: 002
2021-09-01
Yurop
2
I think this is the second Yurop: 003
2021-09-02
Yurop
1
您可以获得每个 order_num, date, region
的最大值 timestamp_updated
,然后使用 window function
再次聚合以获得每个 date, region
的计数
select distinct
date,
region,
count(max(timestamp_updated)) over (partition by date, region) as counts
from t
group by order_num, date, region;
我敢肯定这个问题已经被问过一遍又一遍,但我找不到一个我可以完全理解的简单示例。
我正在尝试对一列进行重复数据删除(执行 DISTINCT ON
),并 COUNT
记录与用于重复数据删除的列不同的 GROUPed By
列,但不引入子查询。
假设我有一个包含以下信息的 table:
order_num | date | region | timestamp_updated |
---|---|---|---|
001 | 2021-09-01 | Murica | 2021-09-02T19:00:01Z |
001 | 2021-09-01 | Murica | 2021-09-03T19:00:01Z |
002 | 2021-09-01 | Yurop | 2021-09-02T19:00:01Z |
003 | 2021-09-01 | Yurop | 2021-09-03T19:00:01Z |
004 | 2021-09-02 | Yurop | 2021-09-03T19:00:01Z |
我想首先获得具有不同 order_num
(保持最近更新)AND 的唯一记录,然后按 date
对组或订单进行计数和 region
.
去重(去掉最旧的
order_num='001A'
):order_num date region timestamp_updated 001 2021-09-01 Murica 2021-09-03T19:00:01Z 002 2021-09-01 Yurop 2021-09-02T19:00:01Z 003 2021-09-01 Yurop 2021-09-03T19:00:01Z 004 2021-09-02 Yurop 2021-09-03T19:00:01Z 然后分组统计:
date region count 2021-09-01 Murica 1 2021-09-01 Yurop 2 2021-09-02 Yurop 1
我知道如何分别执行这两件事 (distinct on(order_num)
+ order by timestamp_updated desc
) 去重,然后 select count(*)
+ group by date, region
) 甚至与子查询一起执行。但我想尽量避免子查询,这里是 window 函数(似乎)派上用场的地方,我不知道 much 任何关于那些.
我能得到的最接近的是组,但它们每个 order_num
显示一条记录。记录正确,但重复:
select distinct on (order_num) date, region, count(1)over (
partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;
该查询 ^^ 显示:
date | region | count | |
---|---|---|---|
2021-09-01 | Murica | 1 | I think this is the first 001 |
2021-09-01 | Murica | 1 | I think this is the second 001 |
2021-09-01 | Yurop | 2 | I think this is the first Yurop: 002 |
2021-09-01 | Yurop | 2 | I think this is the second Yurop: 003 |
2021-09-02 | Yurop | 1 |
您可以获得每个 order_num, date, region
的最大值 timestamp_updated
,然后使用 window function
date, region
的计数
select distinct
date,
region,
count(max(timestamp_updated)) over (partition by date, region) as counts
from t
group by order_num, date, region;