如何在 BigQuery 中使用 Rank() 并按时间范围重新开始排名计数？

Question

在 BigQuery 中，我想按每个季节的 SKU 销售额创建一个排名列表，但是我对如何使用我的代码的当前状态执行此操作有点迷茫。我想要的输出是这样的：

Season      | SKU         | total_spent | Rank 
Spring 2020 | SKU_sample1 | 0 | 1
Spring 2020 | SKU_sample2 | 0 | 2
Spring 2020 | SKU_sample3 | 5 | 3

--- 依此类推，然后在赛季变化时重新开始排名

Season      | SKU         | total_spent | Rank 
Halloween 2020 | SKU_sample1 | 0 | 1
Halloween 2020 | SKU_sample2 | 0 | 2
Halloween 2020 | SKU_sample3 | 0 | 3

我的基本代码是这样的：

SELECT DATE(sales_time) as sales_time,
    CASE
        WHEN DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' THEN 'Spring 2020'
        WHEN DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02' THEN 'Halloween 2020'
        WHEN DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03' THEN 'Thanksgiving 2020'
        WHEN DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04' THEN 'Xmas 2020'
      ELSE
      'unknown_season'
    END
      AS season,
    sku,
    SUM(salesPrice) as total_spent
    FROM sales_table
    WHERE
       DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' 
        OR (DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02')
        OR (DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03')
        OR (DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04')
    GROUP BY  sku,
    DATE(sales_time),
    salesPrice,
    season

Answer 1

我使用 CTE 对其进行了一些分解以封装排名逻辑：

with data as (
    select
        sku,
        case
            when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
            when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
            when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
            when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
            else 'unknown_season'
        end as season,
        sum(salesPrice) as total_spent
    from `project.dataset.sales_table`
    where
        (
            date(sales_time) between '2020-04-09' and '2020-04-23' or
            date(sales_time) between '2020-10-29' and '2020-11-02' or
            date(sales_time) between '2020-11-25' and '2020-12-03' or
            date(sales_time) between '2020-12-17' and '2021-01-04'
        )
        group by 1,2
),
ranked as (
    select 
        season, 
        sku, 
        total_spent,
        -- Within each season, rank by total_spent -- could also use row_number() if you want to break ties
        rank() over(partition by season order by total_spent desc) as spend_rank
    from data
)
select * from ranked
order by season, spend_rank asc

Answer 2

考虑以下选项

#standardSQL
select season, sku, total_spent, 
  rank() over(partition by season order by total_spent desc) as `rank`
from (
  select min(date(sales_time)) season_start, sku, sum(salesPrice) as total_spent, 
    case 
      when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
      when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
      when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
      when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
      else 'unknown_season' 
    end as season  
  from `project.dataset.sales_table`
  group by season, sku
)
order by season_start, `rank`

输出如下

如何在 BigQuery 中使用 Rank() 并按时间范围重新开始排名计数？

How to use Rank() in BigQuery and restart Rank counting by timeframe?

ranking

google-bigquery