SQL 查询分组月度范围
SQL query for grouping monthly period ranges
我在构建查询时遇到了一些问题,该查询将根据项目在一个月内是否存在将其分组为每月范围。我正在使用 PostgreSQL。
例如我有一个 table 数据如下:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
我希望结果查询是这样的:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
有什么办法可以实现吗?
我不知道是否有更简单的方法(可能有),但我现在想不出一个:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
第一个common table expression(parts
)简单的把句号改成日期,这样就可以在算术表达式中使用了。
第二个 CTE (flagged
) 每次在当前行和前一行之间的间隔(以月为单位)不为一时分配一个标志。
第三个 CTE 然后累积这些标志,为每个连续的行数定义一个唯一的组号。
最后的 select 然后简单地获取每个组的开始和结束时间段。不过,我没有费心将句点转换回原始格式。
SQLFiddle 示例也显示了 flagged
CTE 的中间结果:
http://sqlfiddle.com/#!15/8c0aa/2
嗯,执行此操作的一种常见方法可能是递归 SQL:
with recursive cte1 as (
select
"Name" as name,
("Period"||'/01')::date as period
from Table1
), cte2 as (
select
c.name, c.period as s, c.period as e
from cte1 as c
where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')
union all
select
c.name, c.s as s, t.period
from cte2 as c
inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'
)
select
c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2
我不确定这个性能如何,你必须测试它。
这是一个聚合问题,但有一点不同——您需要为每个名称定义相邻月份的组。
假设给定名称的月份不会出现超过一次,您可以通过为每个句点分配一个 "month" 数字并减去一个序号来实现。对于连续的月份,这些值将保持不变。
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here 是 SQL Fiddle 说明这一点。
注意:重复也不是真正的问题。您只需使用 dense_rank()
而不是 row_number()
.
我在构建查询时遇到了一些问题,该查询将根据项目在一个月内是否存在将其分组为每月范围。我正在使用 PostgreSQL。
例如我有一个 table 数据如下:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
我希望结果查询是这样的:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
有什么办法可以实现吗?
我不知道是否有更简单的方法(可能有),但我现在想不出一个:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
第一个common table expression(parts
)简单的把句号改成日期,这样就可以在算术表达式中使用了。
第二个 CTE (flagged
) 每次在当前行和前一行之间的间隔(以月为单位)不为一时分配一个标志。
第三个 CTE 然后累积这些标志,为每个连续的行数定义一个唯一的组号。
最后的 select 然后简单地获取每个组的开始和结束时间段。不过,我没有费心将句点转换回原始格式。
SQLFiddle 示例也显示了 flagged
CTE 的中间结果:
http://sqlfiddle.com/#!15/8c0aa/2
嗯,执行此操作的一种常见方法可能是递归 SQL:
with recursive cte1 as (
select
"Name" as name,
("Period"||'/01')::date as period
from Table1
), cte2 as (
select
c.name, c.period as s, c.period as e
from cte1 as c
where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')
union all
select
c.name, c.s as s, t.period
from cte2 as c
inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'
)
select
c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2
我不确定这个性能如何,你必须测试它。
这是一个聚合问题,但有一点不同——您需要为每个名称定义相邻月份的组。
假设给定名称的月份不会出现超过一次,您可以通过为每个句点分配一个 "month" 数字并减去一个序号来实现。对于连续的月份,这些值将保持不变。
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here 是 SQL Fiddle 说明这一点。
注意:重复也不是真正的问题。您只需使用 dense_rank()
而不是 row_number()
.