计数值检查是否连续

Count values checking if consecutive

这是我的 table:

Event       Order               Timestamp
delFailed   281475031393706     2018-07-24T15:48:08.000Z
reopen      281475031393706     2018-07-24T15:54:36.000Z
reopen      281475031393706     2018-07-24T15:54:51.000Z

我需要计算事件的数量 'delFailed' 和 'reopen' 来计算 #delFailed - #reopen。 困难在于不能有两个相同的连续事件,因此在这种情况下结果将是“0”而不是“-1”。

这是我到目前为止所取得的成就(这是错误的,因为它给了我 -1 而不是 0,因为有两个连续的 "reopen" 事件)

with 
    events as (
        select 
            event as events,
            orders,
            "timestamp"
        from main_source_execevent
        where orders = '281475031393706'
        and event in ('reopen', 'delFailed')
        order by "timestamp"
    ),
    count_events as (
        select 
            count(events) as CEvents,
            events,
            orders
        from events
        group by orders, events
    )
select (
    (select cevents from count_events where events = 'delFailed') - (select cevents from count_events where events = 'reopen')
) as nAttempts,
orders
from count_events
group by orders

如果有两个相同的连续事件,我怎么算一次?

这是一个间隙和孤岛问题,您可以使用行号来检查行是否是两个相同的连续事件

解释

  1. 正常创建的一个行号。
  2. Event 列创建的另一个行号

SELECT *
  FROM (
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
          ,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
    FROM T
  ) t1


|     event |           Order |            timestamp | grp | rn |
|-----------|-----------------|----------------------|-----|----|
| delFailed | 281475031393706 | 2018-07-24T15:48:08Z |   1 |  1 |
|    reopen | 281475031393706 | 2018-07-24T15:54:36Z |   2 |  1 |
|    reopen | 281475031393706 | 2018-07-24T15:54:51Z |   3 |  2 |

当您创建这两行时,您可以获得更高的结果,然后使用 grp - rn 来计算该行是否连续。

 SELECT *,grp-rn
  FROM (
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
          ,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
    FROM T
  ) t1

|     event |           Order |            timestamp | grp | rn |   grp-rn |
|-----------|-----------------|----------------------|-----|----|----------|
| delFailed | 281475031393706 | 2018-07-24T15:48:08Z |   1 |  1 |        0 |
|    reopen | 281475031393706 | 2018-07-24T15:54:36Z |   2 |  1 |        1 |
|    reopen | 281475031393706 | 2018-07-24T15:54:51Z |   3 |  2 |        1 |

你可以看到当有两个相同的连续事件时 grp-rn 列将相同,所以我们可以 group by 通过 grp-rn 列并得到 count

最终查询。

CREATE TABLE T(
  Event VARCHAR(50),
  "Order"  VARCHAR(50),
  Timestamp Timestamp
); 

INSERT INTO T VALUES ('delFailed',281475031393706,'2018-07-24T15:48:08.000Z');
INSERT INTO T VALUES ('reopen',281475031393706,'2018-07-24T15:54:36.000Z');
INSERT INTO T VALUES ('reopen',281475031393706,'2018-07-24T15:54:51.000Z');

查询 1:

SELECT 
    SUM(CASE WHEN  event = 'delFailed' THEN 1 END) -  
    SUM(CASE WHEN  event = 'reopen' THEN 1 END) result
FROM (
  SELECT Event,COUNT(distinct Event)
  FROM (
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
          ,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
    FROM T
  ) t1
  group by grp - rn,Event
)t1

Results:

| result |
|--------|
|      0 |

我只会使用 lag() 来获取任何相似值序列中的第一个事件。然后计算:

select sum( (event = 'reopen')::int ) as num_reopens,
       sum( (event = 'delFailed')::int ) as num_delFailed
from (select mse.*,
             lag(event) over (partition by orders order by "timestamp") as prev_event
      from main_source_execevent mse
      where orders = '281475031393706' and
            event in ('reopen', 'delFailed')
     ) e
where prev_event <> event or prev_event is null;