Redshift - 基于连续行的分组 Table
Redshift - Group Table based on consecutive rows
我现在正在使用这个 table:
我想做的是稍微清理一下 table,将一些后续行组合在一起。
有什么形式可以达到这种效果吗?
第一个 table 已经工作正常,我只是想删除一些行以释放一些磁盘 space。
一种方法是在前一行达到峰值以查看值何时发生变化。假设 valid_to
和 valid_from
确实是日期:
select id, class, min(valid_to), max(valid_from)
from (select t.*,
sum(case when prev_valid_to >= valid_from + interval '-1 day' then 0 else 1 end) over (partition by id order by valid_to rows between unbounded preceding and current row) as grp
from (select t.*,
lag(valid_to) over (partition by id, class order by valid_to) as prev_valid_to
from t
) t
) t
group by id, class, grp;
如果不是日期,那么这就比较棘手了。您可以转换为日期。或者,您可以使用 row_numbers:
的差异
select id, class, min(valid_from), max(valid_to)
from (select t.*,
row_number() over (partition by id order by valid_from) as seqnum,
row_number() over (partition by id, class order by valid_from) as seqnum_2
from t
) t
group by id, class, (seqnum - seqnum_2)
我现在正在使用这个 table:
我想做的是稍微清理一下 table,将一些后续行组合在一起。
有什么形式可以达到这种效果吗?
第一个 table 已经工作正常,我只是想删除一些行以释放一些磁盘 space。
一种方法是在前一行达到峰值以查看值何时发生变化。假设 valid_to
和 valid_from
确实是日期:
select id, class, min(valid_to), max(valid_from)
from (select t.*,
sum(case when prev_valid_to >= valid_from + interval '-1 day' then 0 else 1 end) over (partition by id order by valid_to rows between unbounded preceding and current row) as grp
from (select t.*,
lag(valid_to) over (partition by id, class order by valid_to) as prev_valid_to
from t
) t
) t
group by id, class, grp;
如果不是日期,那么这就比较棘手了。您可以转换为日期。或者,您可以使用 row_numbers:
的差异select id, class, min(valid_from), max(valid_to)
from (select t.*,
row_number() over (partition by id order by valid_from) as seqnum,
row_number() over (partition by id, class order by valid_from) as seqnum_2
from t
) t
group by id, class, (seqnum - seqnum_2)