基于 14 天间隔创建组
Create groups based on 14 day interval
我知道这是一个常见问题,但我找不到符合我的情况的问题。我有这个数据:
+---------+---------+
| user_id | view_dt |
+---------+---------+
| A | 1/1 |
+---------+---------+
| A | 1/10 |
+---------+---------+
| A | 1/14 |
+---------+---------+
| A | 1/22 |
+---------+---------+
| A | 1/23 |
+---------+---------+
| A | 1/30 |
+---------+---------+
我希望根据 14 天的时间间隔对这些数据进行分组。也就是说,这些组将是:
第 1 组:1/1、1/10、1/14
第2组: 1/22, 1/23, 1/30
请注意,我的 1/30 日期应该属于第 2 组,因为 1/30 应该与第 2 组的第一个日期 (1/22) 而不是 (1/1) 进行比较。
我遇到的问题是我自己的查询显示 1/30 属于第 3 组。
CREATE TABLE T (
user_id VARCHAR(20),
view_dt DATETIME
);
INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');
SELECT user_id,
view_dt,
DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT
user_id,
view_dt,
CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 + 1 AS gr
FROM T
) x
ORDER BY user_id
理想输出
+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A | 1/1 | 1 |
+---------+---------+-------+
| A | 1/10 | 1 |
+---------+---------+-------+
| A | 1/14 | 1 |
+---------+---------+-------+
| A | 1/22 | 2 |
+---------+---------+-------+
| A | 1/23 | 2 |
+---------+---------+-------+
| A | 1/30 | 2 |
+---------+---------+-------+
我之前查询的输出:
+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A | 1/1 | 1 |
+---------+---------+-------+
| A | 1/10 | 1 |
+---------+---------+-------+
| A | 1/14 | 1 |
+---------+---------+-------+
| A | 1/22 | 2 |
+---------+---------+-------+
| A | 1/23 | 2 |
+---------+---------+-------+
| A | 1/30 | 3** |
+---------+---------+-------+
一种选择是使用递归 CTE
-- Recursive CTE solution
with cte as
(
-- CTE for adding a row_number
select rn = row_number() over (partition by user_id order by view_dt),
user_id, view_dt
from T
),
rcte as
(
-- RCTE - anchor member
-- first_dt is the first date of the group
select rn, user_id, view_dt, grp = 1, first_dt = view_dt
from cte
where rn = 1
union all
-- RCTE - recursive member
-- if date is more than 14 days from first_dt, grp + 1, update first_dt
select c.rn, c.user_id, c.view_dt,
grp = case when datediff(day, r.first_dt, c.view_dt) > 14
then r.grp + 1
else r.grp
end,
first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
then c.view_dt
else r.first_dt
end
from cte c
inner join rcte r on c.user_id = r.user_id
and c.rn = r.rn + 1
)
select *
from rcte
注意:请避免在日期上使用算术运算符。 view_dt - MIN (view_dt)
。应该使用 datediff()
参见 Bad Habits to Kick : Using shorthand with date/time operations
我知道这是一个常见问题,但我找不到符合我的情况的问题。我有这个数据:
+---------+---------+
| user_id | view_dt |
+---------+---------+
| A | 1/1 |
+---------+---------+
| A | 1/10 |
+---------+---------+
| A | 1/14 |
+---------+---------+
| A | 1/22 |
+---------+---------+
| A | 1/23 |
+---------+---------+
| A | 1/30 |
+---------+---------+
我希望根据 14 天的时间间隔对这些数据进行分组。也就是说,这些组将是:
第 1 组:1/1、1/10、1/14
第2组: 1/22, 1/23, 1/30
请注意,我的 1/30 日期应该属于第 2 组,因为 1/30 应该与第 2 组的第一个日期 (1/22) 而不是 (1/1) 进行比较。
我遇到的问题是我自己的查询显示 1/30 属于第 3 组。
CREATE TABLE T (
user_id VARCHAR(20),
view_dt DATETIME
);
INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');
SELECT user_id,
view_dt,
DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT
user_id,
view_dt,
CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 + 1 AS gr
FROM T
) x
ORDER BY user_id
理想输出
+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A | 1/1 | 1 |
+---------+---------+-------+
| A | 1/10 | 1 |
+---------+---------+-------+
| A | 1/14 | 1 |
+---------+---------+-------+
| A | 1/22 | 2 |
+---------+---------+-------+
| A | 1/23 | 2 |
+---------+---------+-------+
| A | 1/30 | 2 |
+---------+---------+-------+
我之前查询的输出:
+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A | 1/1 | 1 |
+---------+---------+-------+
| A | 1/10 | 1 |
+---------+---------+-------+
| A | 1/14 | 1 |
+---------+---------+-------+
| A | 1/22 | 2 |
+---------+---------+-------+
| A | 1/23 | 2 |
+---------+---------+-------+
| A | 1/30 | 3** |
+---------+---------+-------+
一种选择是使用递归 CTE
-- Recursive CTE solution
with cte as
(
-- CTE for adding a row_number
select rn = row_number() over (partition by user_id order by view_dt),
user_id, view_dt
from T
),
rcte as
(
-- RCTE - anchor member
-- first_dt is the first date of the group
select rn, user_id, view_dt, grp = 1, first_dt = view_dt
from cte
where rn = 1
union all
-- RCTE - recursive member
-- if date is more than 14 days from first_dt, grp + 1, update first_dt
select c.rn, c.user_id, c.view_dt,
grp = case when datediff(day, r.first_dt, c.view_dt) > 14
then r.grp + 1
else r.grp
end,
first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
then c.view_dt
else r.first_dt
end
from cte c
inner join rcte r on c.user_id = r.user_id
and c.rn = r.rn + 1
)
select *
from rcte
注意:请避免在日期上使用算术运算符。 view_dt - MIN (view_dt)
。应该使用 datediff()
参见 Bad Habits to Kick : Using shorthand with date/time operations