具有每个分区的最小值和最大值的雪花 SQL 行

Snowflake SQL rows with minimum and maximum values for each partition

我需要找到 table 个分区上求和列的最大值和最小值。

内部查询是:

select 
ss_store_sk,
d.d_year,
d.d_moy,
sum(ss_quantity) as total_sales,
rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by 
ss_store_sk,
d.d_year,
d.d_moy

这将产生如下所示的 table。

SS_STORE_SK D_YEAR D_MOY TOTAL_SALES 排名
182 1999 12 60836090 1
182 1998 11 60792623 2
182 2001 10 60615582 3
182 2000 9 60459371 4
18 1998 12 232323 1
18 2001 11 123244 2
18 2000 10 3422 3

我可以通过以下方式获得 TOTAL_SALES 的最大值行:

with minmax as (
inner query
)
select * from minmax where "rank" =1

但是如何为每个 SS_STORE_SK 获取 TOTAL_SALES 的最小值行?我需要的结果如下所示。但是只要能够分别获得 TOTAL_SALES 的最小排名行就足够了。

SS_STORE_SK D_YEAR D_MOY TOTAL_SALES 排名
182 1999 12 60836090 1
182 2000 9 60459371 4
18 1998 12 232323 1
18 2000 10 3422 3

我正在使用 Snowflake SQL。

使用max() window函数:

select ss_store_sk, d.d_year, d.d_moy,
       sum(ss_quantity) as total_sales,
       rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank",
       max(sum(ss_quantity)) over (partition by ss_store_sk)
from store_sales join
     date_dim d
     on d.d_date_sk = ss_sold_date_sk
where d.d_year <> 2003 and d.d_moy <> 1
group by ss_store_sk, d.d_year, d.d_moy;

当然,如果您想要最小值,可以使用 min()

使用rank() window函数:一个按sum(ss_quantity)降序排列,一个按升序排列。然后只是 select 两个排名的第一行。

with minmax as
(
   select 
   ss_store_sk,
   d.d_year,
   d.d_moy,
   sum(ss_quantity) as total_sales,
   rank() over (partition by ss_store_sk order by sum(ss_quantity) desc) as "rank"
   rank() over (partition by ss_store_sk order by sum(ss_quantity) ) as "rank2"
   from store_sales
   join date_dim as d on d.d_date_sk = ss_sold_date_sk
   where d.d_year != 2003 and d.d_moy != 1
   group by 
   ss_store_sk,
   d.d_year,
   d.d_moy
)
select * from minmax where rank =1 or rank2=1

一种更简洁的基于窗口函数过滤行的方法:

select 
  ss_store_sk,
  d.d_year,
  d.d_moy,
  sum(ss_quantity) as total_sales
from store_sales
join date_dim as d on d.d_date_sk = ss_sold_date_sk
where d.d_year != 2003 and d.d_moy != 1
group by ss_store_sk, d.d_year, d.d_moy
qualify rank() over (partition by ss_store_sk order by total_sales desc) = 1
     or rank() over (partition by ss_store_sk order by total_sales) = 1