基于 14 天间隔创建组

Create groups based on 14 day interval

我知道这是一个常见问题,但我找不到符合我的情况的问题。我有这个数据:

+---------+---------+
| user_id | view_dt |
+---------+---------+
| A       |     1/1 |
+---------+---------+
| A       |    1/10 |
+---------+---------+
| A       |    1/14 |
+---------+---------+
| A       |    1/22 |
+---------+---------+
| A       |    1/23 |
+---------+---------+
| A       |    1/30 |
+---------+---------+

我希望根据 14 天的时间间隔对这些数据进行分组。也就是说,这些组将是:

第 1 组:1/1、1/10、1/14

第2组: 1/22, 1/23, 1/30

请注意,我的 1/30 日期应该属于第 2 组,因为 1/30 应该与第 2 组的第一个日期 (1/22) 而不是 (1/1) 进行比较。

我遇到的问题是我自己的查询显示 1/30 属于第 3 组。

CREATE TABLE T (
    user_id VARCHAR(20),
    view_dt DATETIME
);

INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');

SELECT user_id,
    view_dt,
    DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT 
    user_id,
    view_dt,
    CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 + 1 AS gr
    FROM T
) x
ORDER BY user_id

理想输出

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 2     |
+---------+---------+-------+

我之前查询的输出:

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 3**   |
+---------+---------+-------+

一种选择是使用递归 CTE

-- Recursive CTE solution
with cte as
(
    -- CTE for adding a row_number
    select rn = row_number() over (partition by user_id order by view_dt),
           user_id, view_dt
    from   T
),
rcte as
(
    -- RCTE - anchor member
    -- first_dt is the first date of the group
    select rn, user_id, view_dt, grp = 1, first_dt = view_dt
    from   cte
    where  rn = 1
    
    union all
    
    -- RCTE - recursive member
    -- if date is more than 14 days from first_dt, grp + 1, update first_dt
    select c.rn, c.user_id, c.view_dt,
           grp = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then r.grp + 1
                      else r.grp
                      end,
           first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then c.view_dt
                      else r.first_dt
                      end
    from   cte c
           inner join rcte r on  c.user_id = r.user_id
                             and c.rn      = r.rn + 1
)
select *
from   rcte

注意:请避免在日期上使用算术运算符。 view_dt - MIN (view_dt)。应该使用 datediff() 参见 Bad Habits to Kick : Using shorthand with date/time operations