Row_number 基于ID和日期

Row_number based on the ID and date

需要 select 基于具有连续日期的 ID 的非活动数据。

Sample Data:
2020-04-19,SQA0199,ACTIVE
2020-04-20,SQA0199,INACTIVE
2020-04-21,SQA0199,INACTIVE
2020-04-22,SQA0199,INACTIVE
2020-04-23,SQA0199,ACTIVE
2020-04-24,SQA0199,INACTIVE
2020-04-25,SQA0199,INACTIVE
2020-04-26,SQA0199,INACTIVE
Sample Script:
SELECT
  ROW_NUMBER() OVER (PARTITION BY SQA_ID ORDER BY timestamp) AS "row number",
  timestamp, SQA_ID
FROM SQA_SMS_INACTIVE where status='INACTIVE';
Desire Output:
2020-04-20,SQA0199,1
2020-04-21,SQA0199,2
2020-04-22,SQA0199,3
2020-04-24,SQA0199,1
2020-04-25,SQA0199,2
2020-04-26,SQA0199,3

我的脚本的输出在行号中继续计数。请帮我解决这个问题

使用 LAG()SUM() window 函数,您可以创建行号所基于的行组:

WITH 
  pre AS (
    SELECT *,
      DATEDIFF(
        timestamp,
        LAG(timestamp) OVER (PARTITION BY SQA_ID ORDER BY timestamp)
      ) <> 1 AS flag 
    FROM SQA_SMS_INACTIVE 
    WHERE status = 'INACTIVE'  
  ),
  cte AS (
    SELECT timestamp, SQA_ID,
      SUM(COALESCE(flag, 0) <> 0) OVER (PARTITION BY SQA_ID ORDER BY timestamp) grp
    FROM pre
  )  
SELECT timestamp, SQA_ID,
  ROW_NUMBER() OVER (PARTITION BY SQA_ID, grp ORDER BY timestamp) AS `row number`
FROM cte  

参见demo
结果:

| timestamp  | SQA_ID  | row number |
| ---------- | ------- | ---------- |
| 2020-04-20 | SQA0199 | 1          |
| 2020-04-21 | SQA0199 | 2          |
| 2020-04-22 | SQA0199 | 3          |
| 2020-04-24 | SQA0199 | 1          |
| 2020-04-25 | SQA0199 | 2          |
| 2020-04-26 | SQA0199 | 3          |