具有每个分区的最小值和最大值的雪花 SQL 行
Snowflake SQL rows with minimum and maximum values for each partition
我需要找到 table 个分区上求和列的最大值和最小值。
内部查询是:
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by
ss_store_sk,
d.d_year,
d.d_moy
这将产生如下所示的 table。
SS_STORE_SK
D_YEAR
D_MOY
TOTAL_SALES
排名
182
1999
12
60836090
1
182
1998
11
60792623
2
182
2001
10
60615582
3
182
2000
9
60459371
4
18
1998
12
232323
1
18
2001
11
123244
2
18
2000
10
3422
3
我可以通过以下方式获得 TOTAL_SALES 的最大值行:
with minmax as (
inner query
)
select * from minmax where "rank" =1
但是如何为每个 SS_STORE_SK 获取 TOTAL_SALES 的最小值行?我需要的结果如下所示。但是只要能够分别获得 TOTAL_SALES 的最小排名行就足够了。
SS_STORE_SK
D_YEAR
D_MOY
TOTAL_SALES
排名
182
1999
12
60836090
1
182
2000
9
60459371
4
18
1998
12
232323
1
18
2000
10
3422
3
我正在使用 Snowflake SQL。
使用max()
window函数:
select ss_store_sk, d.d_year, d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank",
max(sum(ss_quantity)) over (partition by ss_store_sk)
from store_sales join
date_dim d
on d.d_date_sk = ss_sold_date_sk
where d.d_year <> 2003 and d.d_moy <> 1
group by ss_store_sk, d.d_year, d.d_moy;
当然,如果您想要最小值,可以使用 min()
。
使用rank()
window函数:一个按sum(ss_quantity)
降序排列,一个按升序排列。然后只是 select 两个排名的第一行。
with minmax as
(
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
rank() over (partition by ss_store_sk order by sum(ss_quantity) ) as "rank2"
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by
ss_store_sk,
d.d_year,
d.d_moy
)
select * from minmax where rank =1 or rank2=1
一种更简洁的基于窗口函数过滤行的方法:
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by ss_store_sk, d.d_year, d.d_moy
qualify rank() over (partition by ss_store_sk order by total_sales desc) = 1
or rank() over (partition by ss_store_sk order by total_sales) = 1
我需要找到 table 个分区上求和列的最大值和最小值。
内部查询是:
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by
ss_store_sk,
d.d_year,
d.d_moy
这将产生如下所示的 table。
SS_STORE_SK | D_YEAR | D_MOY | TOTAL_SALES | 排名 |
---|---|---|---|---|
182 | 1999 | 12 | 60836090 | 1 |
182 | 1998 | 11 | 60792623 | 2 |
182 | 2001 | 10 | 60615582 | 3 |
182 | 2000 | 9 | 60459371 | 4 |
18 | 1998 | 12 | 232323 | 1 |
18 | 2001 | 11 | 123244 | 2 |
18 | 2000 | 10 | 3422 | 3 |
我可以通过以下方式获得 TOTAL_SALES 的最大值行:
with minmax as (
inner query
)
select * from minmax where "rank" =1
但是如何为每个 SS_STORE_SK 获取 TOTAL_SALES 的最小值行?我需要的结果如下所示。但是只要能够分别获得 TOTAL_SALES 的最小排名行就足够了。
SS_STORE_SK | D_YEAR | D_MOY | TOTAL_SALES | 排名 |
---|---|---|---|---|
182 | 1999 | 12 | 60836090 | 1 |
182 | 2000 | 9 | 60459371 | 4 |
18 | 1998 | 12 | 232323 | 1 |
18 | 2000 | 10 | 3422 | 3 |
我正在使用 Snowflake SQL。
使用max()
window函数:
select ss_store_sk, d.d_year, d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank",
max(sum(ss_quantity)) over (partition by ss_store_sk)
from store_sales join
date_dim d
on d.d_date_sk = ss_sold_date_sk
where d.d_year <> 2003 and d.d_moy <> 1
group by ss_store_sk, d.d_year, d.d_moy;
当然,如果您想要最小值,可以使用 min()
。
使用rank()
window函数:一个按sum(ss_quantity)
降序排列,一个按升序排列。然后只是 select 两个排名的第一行。
with minmax as
(
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
rank() over (partition by ss_store_sk order by sum(ss_quantity) ) as "rank2"
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by
ss_store_sk,
d.d_year,
d.d_moy
)
select * from minmax where rank =1 or rank2=1
一种更简洁的基于窗口函数过滤行的方法:
select
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by ss_store_sk, d.d_year, d.d_moy
qualify rank() over (partition by ss_store_sk order by total_sales desc) = 1
or rank() over (partition by ss_store_sk order by total_sales) = 1