Islands and Gaps 算法不会为每个岛和间隙生成一个全局唯一的 id

Islands and Gaps algorithm does not produce a globally unique id for each island and gap

我正在使用标准 Islands and Gaps 算法来查找连续值块(1 或 0)。 ProductionState 列表示根据连接到机器的传感器的读数生产或不生产的时间段。相关步骤包含在这个 Common Table 元素中:

-- Production state islands with unique Id
production_state_03( Timestamp, ProductionState, ProductionStateIslandId ) as
(
    select
        Timestamp,
        ProductionState,
        row_number() over ( order by Timestamp ) - row_number() over ( partition by ProductionState order by ProductionState )
    from production_state_02
)

结果如下table:

问题是每个岛或间隙的 ProductionStateIslandId 不一定是全局唯一的,这会导致后面的分析步骤出错。是否有不同的方法来计算 Islands 和 Gaps 总是会产生全局唯一的 Id 值?

这件事: row_number() over ( partition by ProductionState order by ProductionState )

没有意义。它所做的只是创建一个 seem-be-be-ordered、in-reality-random 数字。

您的差距不寻常,因为它们不是真正的差距,0 值行仍然存在。也许条件求和会有所帮助:

row_number() over ( order by Timestamp ) - sum(ProductionState) over (order by Timestamp)

第二个row_number也应该按时间戳排序。

  row_number() over (order by [Timestamp]) 
  - row_number() over (partition by ProductionState 
                       order by [Timestamp])

  row_number() over (order by [Timestamp]) 
  + row_number() over (partition by ProductionState 
                       order by [Timestamp] DESC)

但该更正不会使其在全球范围内独一无二。

计算此类排名的另一种方法是对更改标志求和。

production_state_03 ([Timestamp], ProductionState, ProductionStateIslandId) as
(
    select [Timestamp], ProductionState
    , rnk = SUM(flag) over (order by [Timestamp])
    from
    (
        select [Timestamp], ProductionState
        , flag = IIF(ProductionState = LAG(ProductionState) over (order by [Timestamp]), 0, 1)
        from production_state_02
    ) q
)

这个 Gaps-And-Islands 解决技巧确实需要一个额外的子查询,但排名是连续的。