SQL - 将时间序列事件转换为 On/Off 对(处理可能缺失的开或关)

SQL - Convert Time Series Events into On/Off Pairs (handling potential missing On's or Off's)

在SQL服务器中,我有一组时间序列on/off事件,看起来像这样(为简单起见,我只显示了一个警报编号,但相同的table):

'Alarms' Table:
AlarmNumber   Time                      AlarmState
1592          2020-01-02 01:52:02       1
1592          2020-01-02 01:58:07       0
1592          2020-04-28 03:46:49       1
1592          2020-04-28 06:19:10       0
1592          2020-06-04 00:25:22       1
1592          2020-08-27 01:57:03       1
1592          2020-08-27 05:16:32       0
1592          2020-09-17 02:51:57       0

我正在尝试将其转换成 On/Off 对:

Output I am trying to achieve, ideally as an SQL View:
AlarmNumber   StartTime                 EndTime
1592          2020-01-02 01:52:02       2020-01-02 01:58:07
1592          2020-04-28 03:46:49       2020-04-28 06:19:10
1592          2020-06-04 00:25:22       NULL
1592          2020-08-27 01:57:03       2020-08-27 05:16:32
1592          NULL                      2020-09-17 02:51:57

如果我有一个干净的数据集,没有丢失 'On' 或 'Off' 事件,我可以通过以下方式实现:

select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)

(代码修改自 Adriano Carneiro 在 T-SQL Start and end date times from a single column 的回答)

我的问题:谁能想出一种有效的方法来处理 'Alarms' table 以实现我的示例输出,它处理丢失的 on/off 事件(在示例中显示为 NULL输出)?

我的备份是使用 Cursor 和 Where 循环,但我希望有一种方法可以通过将 On/Off 事件对组合在一起来实现,我只是没能得到它工作。我有 500k+ 个事件,因此这是一个需要迭代的大型数据集。

欢迎提出任何想法!

谢谢, 托马斯

------ 2020 年 11 月 1 日更新 ------

已经提供了两个很好的解决方案,它们都可以正常工作,并且在 80,000 行混乱的现实世界数据样本上提供相同的结果。

一旦有了行的顺序,只需 SELECT 将它们分成几部分,然后使用 UNION ALL:

合并结果
DECLARE @DataSource TABLE
(
    [AlarmNumber] INT
   ,[Time] DATETIME2(0)
   ,[AlarmState] INT
);

INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
VALUES (1592, '2020-01-02 01:52:02', 1)
      ,(1592, '2020-01-02 01:58:07', 0)
      ,(1592, '2020-04-28 03:46:49', 1)
      ,(1592, '2020-04-28 06:19:10', 0)
      ,(1592, '2020-06-04 00:25:22', 1)
      ,(1592, '2020-08-27 01:57:03', 1)
      ,(1592, '2020-08-27 05:16:32', 0)
      ,(1592, '2020-09-17 02:51:57', 0);

-- Add a rowID column to the data
WITH DataSource AS
(
    SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
    FROM @DataSource
)

-- This is just here so we can sort the result at the end
SELECT * FROM (

-- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
-- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
    SELECT DS1.AlarmNumber
            ,DS1.Time As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS1
    LEFT JOIN DataSource DS2
        ON DS1.[rowID] = DS2.[rowID] - 1
        AND DS1.AlarmNumber = DS2.AlarmNumber
        AND DS2.[AlarmState] = 0
    WHERE DS1.[AlarmState] = 1

    UNION ALL

    -- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
    SELECT DS2.AlarmNumber
            ,NULL As StartTime
            ,DS2.Time As EndTime
    FROM DataSource DS2

    INNER JOIN DataSource DS1
        ON DS2.[rowID] -1 = DS1.[rowID]
        AND DS1.[AlarmState] = 0
        AND DS2.AlarmNumber = DS1.AlarmNumber
    
    WHERE DS2.[AlarmState] = 0

    UNION ALL

    -- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
    SELECT DS1.AlarmNumber
            ,NULL As StartTime
            ,DS1.Time As EndTime
    FROM DataSource DS1
    WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
) z
ORDER BY COALESCE(StartTime,EndTime), AlarmNumber

一个组由两个连续的行组成,其中第一行的状态为 1,第二行的状态为 0。我将使用 window 函数来解决这个问题,如下所示:

select 
    alarmnumber,
    max(case when alarmstate = 1 then time end) start_time,
    max(case when alarmstate = 0 then time end) end_time
from (
    select a.*, 
        sum(case when alarmstate = 0 and lag_alarmstate = 1 then 0 else 1 end)
            over(partition by alarmnumber order by time) grp
    from (
        select a.*, 
            lag(alarmstate) over(partition by alarmnumber order by time) lag_alarmstate
        from alarms a
    ) a
) a
group by alarmnumber, grp

这使用 lag() 检索“先前”状态,并使用累计和来定义组。最后一步是条件聚合。

Demo on DB Fiddle:

alarmnumber | start_time              | end_time               
:---------- | :---------------------- | :----------------------
1592        | 2020-01-02 01:52:02.000 | 2020-01-02 01:58:07.000
1592        | 2020-04-28 03:46:49.000 | 2020-04-28 06:19:10.000
1592        | 2020-06-04 00:25:22.000 | null                   
1592        | 2020-08-27 01:57:03.000 | 2020-08-27 05:16:32.000
1592        | null                    | 2020-09-17 02:51:57.000