对多个日期范围内的行进行分组

Group Rows Over Multiple Date Ranges

我正在尝试用这个问题同时解决我认为(至少)两个问题,因此可能已经存在部分回答这个问题的答案,但我并不从根本上理解什么概念我正在尝试处理数据以使其成为我想要的最终形式。

问题:我有三种数据(我们称它们为气体、液体、固体)发生在较长的一段时间内,我们称之为观察期。要求显示观察期的 GasPeriod 数据(如果存在),然后是 Liquid,然后是 Solid。对于给定的状态,只会有 0 或 1 个活动记录。

Gas  X----X       X----X    X---------X 1
Liq  X-------X    X--------XX---------X 2
Sol  X--------------XX----------------X 3

Need 1----12-23--31----12--21---------1

我需要的是将这 8 个范围(3 个 GasPeriod、3 个 LiquidPeriod、2 个 SolidPeriod)缩减为 6 行,其中包含 6 个日期范围和来自“获胜”行的数据,以保留给定重叠的 PeriodTemp 和 Description一段时间。

任何解决方案都会有所帮助,但我也非常感谢这里对实际问题的任何细分,这样我就可以对自己正在做的事情进行自我教育。我怀疑涉及的问题步骤是:

为清楚起见进行了编辑

create table ObservationPeriod (
    ObservationPeriodId    BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table GasPeriod (
    GasPeriodId            BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table LiquidPeriod (
    LiquidPeriodId         BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table SolidPeriod (
    SolidPeriodId          BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table ObservationPeriodObserved (
    ObservationPeriodObservedId BIGINT IDENTITY(1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

观察期数据

ObservationPeriodId BusinessKey Effective Expiry
1 24 2021-01-01 2021-12-31

GasPeriod 数据

GasPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2020-09-30 2021-03-31
2 24 102.456 second g 2021-06-01 2021-07-31
3 24 100.011 third g 2021-09-01 9999-12-31

LiquidPeriod 数据

LiquidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 98.99 first l 2021-01-01 2021-04-30
2 24 98.76 second l 2021-06-01 2021-08-31
3 24 99.978 third l 2021-09-01 9999-12-31

SolidPeriod 数据

SolidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 -0.145 first s 2021-01-01 2021-06-30
2 24 -0.987 second s 2021-07-01 9999-12-31

ObvservationPeriodObserved 数据

ObvservationPeriodObservedIdId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2021-01-01 2021-03-31
2 24 98.99 first l 2021-04-01 2021-04-30
3 24 -0.145 first s 2021-05-01 2021-05-31
4 24 102.456 second g 2021-06-01 2021-07-31
5 24 98.76 second l 2021-08-01 2021-08-31
6 24 100.011 third g 2021-09-01 2021-12-31

想法是,对于 ObservationPeriod 中的给定行,在上述三个粒度之间有许多关联的时间段,但在给定的时间段内,只有一个应该被记录为 ObservationPeriod 的子集。

还请假设这里必须有单独的粒度,并且不能通过将这些数据放入相同的 table 来解决这个问题 -- 它不能。我不能在这里使用实际的商业模式,所以我试图在概念上尽可能接近。

以下方法首先创建三个数据集 GasLiquidSolid 的并集。在此联合中,创建了一个附加列 PeriodPriority,这将有助于选择获胜行。我将获胜行解释为观察期内发生的周期条目,是最近的且尚未过期,将根据 Gas-1、[=14= 的排名进行选择]-2Solid-3。这构成了 DENSE_RANK window 函数的基础,因为它按最近的过期日期和 PeriodPriority 排序。由于这条中奖记录可能有超出观察期的日期,所以我用case表达式保证插入的值在观察期内。

虽然只有一个观察期,但我仍然包含了 where 子句 WHERE op.ObservationPeriodId=1,您可以根据需要 update/remove。我也加入了 BusinessKey,因为我不确定这是否会在你的整个系列中发生变化。如果 BusinessKey 永远不会改变,则可以从连接表达式中省略它。

生成的查询当前看起来像这样

SELECT 
    ROW_NUMBER() OVER (ORDER BY c.Effective, c.Expiry ) as ObservationPeriodObservedId,
    c.BusinessKey,
    c.PeriodTemp, 
    c.Description,
    c.Effective,
    CASE 
        WHEN c.Expiry >= op.Expiry THEN op.Expiry
        ELSE c.Expiry
    END as Expiry
FROM (
    SELECT 
        *,
        DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
    FROM ( 
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
        FROM GasPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
        FROM LiquidPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
        FROM SolidPeriod
    ) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                   op.Effective <= c.Effective AND
                                   (op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry

和插入语句

INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
SELECT 
    c.BusinessKey,
    c.PeriodTemp, 
    c.Description,
    c.Effective,
    CASE 
        WHEN c.Expiry >= op.Expiry THEN op.Expiry
        ELSE c.Expiry
    END as Expiry
FROM (
    SELECT 
        *,
        DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
    FROM ( 
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
        FROM GasPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
        FROM LiquidPeriod
        UNION ALL
        SELECT 
            BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
        FROM SolidPeriod
    ) t
) c
INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                   op.Effective <= c.Effective AND
                                   (op.Expiry >= c.Expiry OR rk=1)
WHERE op.ObservationPeriodId=1
ORDER BY c.Effective, c.Expiry

生成所需的结果。

View working db fiddle here

编辑 1 - 连续有效,到期日期

根据更新的问题和评论,我修改了上面的内容以利用 LAGDATE_ADD 来提供连续的日期。下面包含插入查询(后一部分是 SELECT)以及提供所需结果的更新数据库 fiddle。这里的例外是 SolidPeriodrecord 1,其结束日期为 2021-06-30。当此日期更改为所需结果中的 2021-05-31 时,查询更正了与预期不符的 1 日期。如果此处有其他注意事项或示例数据中有错误,请告诉我。我对样本数据进行了调整而不是临时计算,因为我无法假设一些逻辑来进行这样的更改(即在任意记录上减去 1 个月)。让我知道这是否适合您并进一步提供建议。

INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)

    SELECT 
        c.BusinessKey,
        c.PeriodTemp, 
        c.Description,
        CASE
            WHEN LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry) IS NULL THEN op.Effective
            ELSE DATEADD(DAY,1,LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry))
        END as Effective,
        CASE 
            WHEN c.Expiry >= op.Expiry THEN op.Expiry
            ELSE c.Expiry
        END as Expiry
        
    FROM (
        SELECT 
            *,
            DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
        FROM ( 
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
            FROM GasPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
            FROM LiquidPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
            FROM SolidPeriod
        ) t
    ) c
    INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                       op.Effective <= c.Effective AND
                                       (op.Expiry >= c.Expiry OR rk=1)
    WHERE op.ObservationPeriodId=1
    ORDER BY c.Effective, c.Expiry 

View Demo DB Fiddle

让我知道这是否适合你。