如何在 BigQuery 中使用 Rank() 并按时间范围重新开始排名计数?
How to use Rank() in BigQuery and restart Rank counting by timeframe?
在 BigQuery 中,我想按每个季节的 SKU 销售额创建一个排名列表,但是我对如何使用我的代码的当前状态执行此操作有点迷茫。我想要的输出是这样的:
Season | SKU | total_spent | Rank
Spring 2020 | SKU_sample1 | 0 | 1
Spring 2020 | SKU_sample2 | 0 | 2
Spring 2020 | SKU_sample3 | 5 | 3
--- 依此类推,然后在赛季变化时重新开始排名
Season | SKU | total_spent | Rank
Halloween 2020 | SKU_sample1 | 0 | 1
Halloween 2020 | SKU_sample2 | 0 | 2
Halloween 2020 | SKU_sample3 | 0 | 3
我的基本代码是这样的:
SELECT DATE(sales_time) as sales_time,
CASE
WHEN DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' THEN 'Spring 2020'
WHEN DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02' THEN 'Halloween 2020'
WHEN DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03' THEN 'Thanksgiving 2020'
WHEN DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04' THEN 'Xmas 2020'
ELSE
'unknown_season'
END
AS season,
sku,
SUM(salesPrice) as total_spent
FROM sales_table
WHERE
DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23'
OR (DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02')
OR (DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03')
OR (DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04')
GROUP BY sku,
DATE(sales_time),
salesPrice,
season
我使用 CTE 对其进行了一些分解以封装排名逻辑:
with data as (
select
sku,
case
when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
else 'unknown_season'
end as season,
sum(salesPrice) as total_spent
from `project.dataset.sales_table`
where
(
date(sales_time) between '2020-04-09' and '2020-04-23' or
date(sales_time) between '2020-10-29' and '2020-11-02' or
date(sales_time) between '2020-11-25' and '2020-12-03' or
date(sales_time) between '2020-12-17' and '2021-01-04'
)
group by 1,2
),
ranked as (
select
season,
sku,
total_spent,
-- Within each season, rank by total_spent -- could also use row_number() if you want to break ties
rank() over(partition by season order by total_spent desc) as spend_rank
from data
)
select * from ranked
order by season, spend_rank asc
考虑以下选项
#standardSQL
select season, sku, total_spent,
rank() over(partition by season order by total_spent desc) as `rank`
from (
select min(date(sales_time)) season_start, sku, sum(salesPrice) as total_spent,
case
when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
else 'unknown_season'
end as season
from `project.dataset.sales_table`
group by season, sku
)
order by season_start, `rank`
输出如下
在 BigQuery 中,我想按每个季节的 SKU 销售额创建一个排名列表,但是我对如何使用我的代码的当前状态执行此操作有点迷茫。我想要的输出是这样的:
Season | SKU | total_spent | Rank
Spring 2020 | SKU_sample1 | 0 | 1
Spring 2020 | SKU_sample2 | 0 | 2
Spring 2020 | SKU_sample3 | 5 | 3
--- 依此类推,然后在赛季变化时重新开始排名
Season | SKU | total_spent | Rank
Halloween 2020 | SKU_sample1 | 0 | 1
Halloween 2020 | SKU_sample2 | 0 | 2
Halloween 2020 | SKU_sample3 | 0 | 3
我的基本代码是这样的:
SELECT DATE(sales_time) as sales_time,
CASE
WHEN DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23' THEN 'Spring 2020'
WHEN DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02' THEN 'Halloween 2020'
WHEN DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03' THEN 'Thanksgiving 2020'
WHEN DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04' THEN 'Xmas 2020'
ELSE
'unknown_season'
END
AS season,
sku,
SUM(salesPrice) as total_spent
FROM sales_table
WHERE
DATE(sales_time) >= '2020-04-09' AND DATE(sales_time) <= '2020-04-23'
OR (DATE(sales_time) >= '2020-10-29' AND DATE(sales_time) <= '2020-11-02')
OR (DATE(sales_time) >= '2020-11-25' AND DATE(sales_time) <= '2020-12-03')
OR (DATE(sales_time) >= '2020-12-17' AND DATE(sales_time) <= '2021-01-04')
GROUP BY sku,
DATE(sales_time),
salesPrice,
season
我使用 CTE 对其进行了一些分解以封装排名逻辑:
with data as (
select
sku,
case
when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
else 'unknown_season'
end as season,
sum(salesPrice) as total_spent
from `project.dataset.sales_table`
where
(
date(sales_time) between '2020-04-09' and '2020-04-23' or
date(sales_time) between '2020-10-29' and '2020-11-02' or
date(sales_time) between '2020-11-25' and '2020-12-03' or
date(sales_time) between '2020-12-17' and '2021-01-04'
)
group by 1,2
),
ranked as (
select
season,
sku,
total_spent,
-- Within each season, rank by total_spent -- could also use row_number() if you want to break ties
rank() over(partition by season order by total_spent desc) as spend_rank
from data
)
select * from ranked
order by season, spend_rank asc
考虑以下选项
#standardSQL
select season, sku, total_spent,
rank() over(partition by season order by total_spent desc) as `rank`
from (
select min(date(sales_time)) season_start, sku, sum(salesPrice) as total_spent,
case
when date(sales_time) between '2020-04-09' and '2020-04-23' then 'Spring 2020'
when date(sales_time) between '2020-10-29' and '2020-11-02' then 'Halloween 2020'
when date(sales_time) between '2020-11-25' and '2020-12-03' then 'Thanksgiving 2020'
when date(sales_time) between '2020-12-17' and '2021-01-04' then 'Xmas 2020'
else 'unknown_season'
end as season
from `project.dataset.sales_table`
group by season, sku
)
order by season_start, `rank`
输出如下