SQL：从重叠日期推导出有效定价

Question

我有日期重叠的定价记录。很少有日期会出现不止一个重叠价格。请按照以下示例操作：

例如 2022/02/15 有 2 个价格 10 和 8。

article	price	startdate	enddate
123	10	2022/02/02	2049/12/31
123	8	2022/02/14	2022/09/14
123	5	2022/03/14	2022/04/06
123	4	2022/04/11	2022/04/27

我想对如下日期范围应用有效价格，避免输出中出现价格冲突。

article	price	startdate	enddate
123	10	2022/02/02	2022/02/13
123	8	2022/02/14	2022/03/13
123	5	2022/03/14	2022/04/06
123	8	2022/04/07	2022/04/10
123	4	2022/04/11	2022/04/27
123	8	2022/04/28	2022/09/14
123	10	2022/09/15	2049/12/31

我能想到 window 功能来调整结束日期和价格，但我无法完全解决问题以获得完整的解决方案。任何 suggestion/solution 表示赞赏。

数据库：雪花

谢谢

Answer 1

使用新起始价格的逻辑 window 重叠获胜。

创建日期版本：

with data(article,price,startdate,enddate) as (
    select * FROM VALUES
        (123, 10, '2022-02-02'::date, '2049-12-31'::date),
        (123, 8,  '2022-02-14'::date, '2022-09-14'::date),
        (123, 5,  '2022-03-14'::date, '2022-04-06'::date),
        (123, 4,  '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
    select article,
        date as startdate,
        lead(date) over(partition by article order by date)-1 as enddate
    from (
        select distinct article, startdate as date from data
        union
        select distinct article, enddate+1 as date from data
    )
    qualify enddate is not null
)
select 
    d1.article, 
    d1.price, 
    d2.startdate,
    d2.enddate
from data as d1
join dis_times as d2
    on d1.article = d2.article 
        and d2.startdate between d1.startdate and d1.enddate qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;

给出：

ARTICLE	PRICE	S_STARTDATE	S_ENDDATE
123	10	2022-02-02	2022-02-13
123	8	2022-02-14	2022-03-13
123	5	2022-03-14	2022-04-06
123	8	2022-04-07	2022-04-10
123	4	2022-04-11	2022-04-27
123	8	2022-04-28	2022-09-14
123	10	2022-09-15	2049-12-31

连续时间戳版本：

with data(article,price,startdate,enddate) as (
    select * FROM VALUES
        (123, 10, '2022-02-02'::date, '2049-12-31'::date),
        (123, 8,  '2022-02-14'::date, '2022-09-14'::date),
        (123, 5,  '2022-03-14'::date, '2022-04-06'::date),
        (123, 4,  '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
    select article,
        date as startdate,
        lead(date) over(partition by article order by date) as enddate
    from (
        select distinct article, startdate as date from data
        union
        select distinct article, enddate as date from data
    )
    qualify enddate is not null
)
select 
    d1.article, 
    d1.price, 
    d2.startdate,
    d2.enddate
from data as d1
join dis_times as d2
    on d1.article = d2.article 
        and d2.startdate >= d1.startdate and d2.startdate < d1.enddate
qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;

给出：

ARTICLE	PRICE	S_STARTDATE	S_ENDDATE
123	10	2022-02-02	2022-02-14
123	8	2022-02-14	2022-03-14
123	5	2022-03-14	2022-04-06
123	8	2022-04-06	2022-04-11
123	4	2022-04-11	2022-04-27
123	8	2022-04-27	2022-09-14
123	10	2022-09-14	2049-12-31

感谢 MatBailie 提供更紧密的连接建议。

join dis_times as d2
    on d1.article = d2.article 
        and d2.startdate between d1.startdate and d1.enddate

我通常会在这方面做的连续范围

and d2.startdate between d1.startdate and d1.enddate and d2.startdate < d1.enddate

而不是这种形式

and d2.startdate >= d1.startdate and d2.startdate < d1.enddate

因为我的经验是它表现更好。总是测试你的复杂性。

Answer 2

我做的第一件事是——我将您的 price-per-date 范围数据转换为 price-per-date 查找 table。

create or replace temporary table price_date_lookup as
 
select distinct 
       article,
       dateadd('day',b.index-1,start_date) as dates,
       first_value(price) over (partition by article, dates order by end_date) as price
from my_table, 
     lateral split_to_table(repeat('.',datediff(day,start_date,end_date)), '.') b;

备注：

first_value 通过基于结束日期覆盖价格来处理重叠。
lateral... 基本上有助于创建一个包含范围内所有日期的日期列

我一创建 table，我就认为其余的可以像 gaps and island 问题一样处理。

with cte1 as

(select *, case when lag(price) over (partition by article order by dates)=price then 0 else 1 end as price_start --flag start of a new price island
 from price_date_lookup),
 
cte2 as
 
(select *, sum(price_start) over (partition by article order by dates) as price_id --assign id to all the price islands
 from cte1)

 
select article, 
       price,
       min(dates) as start_date,
       max(dates) as end_date
from cte2
group by article,price,price_id;

SQL：从重叠日期推导出有效定价

SQL: Deduce Effective Pricing from overlapping Dates

sql

snowflake-cloud-data-platform

创建日期版本：

连续时间戳版本：