我如何根据条件对 sql 上的行进行分组
How can i group rows on sql base on condition
我正在使用 redshift sql 并希望将凭证期限重叠的用户分组到一行中(显示最短开始日期和最长结束日期)
例如,如果我有这些记录,
我想使用 redshift 实现这个结果
由于第 1 行和第 2 行的日期重叠,所以解释很准确,我想将它们组合在一起并得到 min(Start_date) 和 max(End_Date)
我真的不知道从哪里开始。尝试使用 row_number 对它们进行分区,但似乎效果不佳。这是我试过的。
select
id,
start_date,
end_date,
lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users
有什么建议吗?谢谢各位大佬
这是一种间隙和孤岛问题。因为日期是任意的,让我建议以下方法:
- 使用累计最大值获取当前日期之前的最大值 end_date。
- 用逻辑判断什么时候没有总(即新一期开始)。
- 开始的累积总和为组提供了标识符。
- 然后汇总。
作为SQL:
select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;
另一种方法是使用递归 CTE:
- 将所有行分成按
id
分组并按 start_date
和 end_date
排序的编号分区
- 迭代计算每行的
group_start_date
(最终结果中必须合并的行将具有相同的 group_start_date
)
- 最后,您需要按
id
和 group_start_date
对 CTE 进行分组,每组取最大值 end_date
。
这里是对应的sqlfiddle:http://sqlfiddle.com/#!18/7059b/2
还有 SQL,以防万一:
WITH cteSequencing AS (
-- Get Values Order
SELECT *, start_date AS group_start_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date, end_date) AS iSequence
FROM users),
Recursion AS (
-- Anchor - the first value in groups
SELECT *
FROM cteSequencing
WHERE iSequence = 1
UNION ALL
-- Remaining items
SELECT b.id, b.start_date, b.end_date,
CASE WHEN a.end_date > b.start_date THEN a.group_start_date
ELSE b.start_date
END
AS groupStartDate,
b.iSequence
FROM Recursion AS a
INNER JOIN cteSequencing AS b ON a.iSequence + 1 = b.iSequence AND a.id = b.id)
SELECT id, group_start_date as start_date, MAX(end_date) as end_date FROM Recursion group by id, group_start_date ORDER BY id, group_start_date
我正在使用 redshift sql 并希望将凭证期限重叠的用户分组到一行中(显示最短开始日期和最长结束日期)
例如,如果我有这些记录,
我想使用 redshift 实现这个结果
由于第 1 行和第 2 行的日期重叠,所以解释很准确,我想将它们组合在一起并得到 min(Start_date) 和 max(End_Date)
我真的不知道从哪里开始。尝试使用 row_number 对它们进行分区,但似乎效果不佳。这是我试过的。
select
id,
start_date,
end_date,
lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users
有什么建议吗?谢谢各位大佬
这是一种间隙和孤岛问题。因为日期是任意的,让我建议以下方法:
- 使用累计最大值获取当前日期之前的最大值 end_date。
- 用逻辑判断什么时候没有总(即新一期开始)。
- 开始的累积总和为组提供了标识符。
- 然后汇总。
作为SQL:
select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;
另一种方法是使用递归 CTE:
- 将所有行分成按
id
分组并按start_date
和end_date
排序的编号分区
- 迭代计算每行的
group_start_date
(最终结果中必须合并的行将具有相同的group_start_date
) - 最后,您需要按
id
和group_start_date
对 CTE 进行分组,每组取最大值end_date
。
这里是对应的sqlfiddle:http://sqlfiddle.com/#!18/7059b/2
还有 SQL,以防万一:
WITH cteSequencing AS (
-- Get Values Order
SELECT *, start_date AS group_start_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date, end_date) AS iSequence
FROM users),
Recursion AS (
-- Anchor - the first value in groups
SELECT *
FROM cteSequencing
WHERE iSequence = 1
UNION ALL
-- Remaining items
SELECT b.id, b.start_date, b.end_date,
CASE WHEN a.end_date > b.start_date THEN a.group_start_date
ELSE b.start_date
END
AS groupStartDate,
b.iSequence
FROM Recursion AS a
INNER JOIN cteSequencing AS b ON a.iSequence + 1 = b.iSequence AND a.id = b.id)
SELECT id, group_start_date as start_date, MAX(end_date) as end_date FROM Recursion group by id, group_start_date ORDER BY id, group_start_date