SQL 查询分组月度范围

SQL query for grouping monthly period ranges

我在构建查询时遇到了一些问题,该查询将根据项目在一个月内是否存在将其分组为每月范围。我正在使用 PostgreSQL。

例如我有一个 table 数据如下:

Name    Period(text)
Ana     2010/09
Ana     2010/10
Ana     2010/11
Ana     2010/12
Ana     2011/01
Ana     2011/02
Peter   2009/05
Peter   2009/06
Peter   2009/07
Peter   2009/08
Peter   2009/12
Peter   2010/01
Peter   2010/02
Peter   2010/03
John    2009/05
John    2009/06
John    2009/09
John    2009/11
John    2009/12

我希望结果查询是这样的:

Name    Start     End
Ana     2010/09   2011/02
Peter   2009/05   2009/08
Peter   2009/12   2010/03
John    2009/05   2009/06
John    2009/09   2009/09
John    2009/11   2009/12

有什么办法可以实现吗?

我不知道是否有更简单的方法(可能有),但我现在想不出一个:

with parts as (
  select name, 
         to_date(replace(period,'/',''), 'yyyymm') as period
  from names
), flagged as (
  select name, 
         period, 
         case 
           when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
           else 1
         end as group_flag
  from parts
), grouped as (
  select flagged.*, 
         coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
  from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);

第一个common table expressionparts)简单的把句号改成日期,这样就可以在算术表达式中使用了。

第二个 CTE (flagged) 每次在当前行和前一行之间的间隔(以月为单位)不为一时分配一个标志。

第三个 CTE 然后累积这些标志,为每个连续的行数定义一个唯一的组号。

最后的 select 然后简单地获取每个组的开始和结束时间段。不过,我没有费心将句点转换回原始格式。

SQLFiddle 示例也显示了 flagged CTE 的中间结果:
http://sqlfiddle.com/#!15/8c0aa/2

嗯,执行此操作的一种常见方法可能是递归 SQL:

with recursive cte1 as (
    select
        "Name" as name,
        ("Period"||'/01')::date as period
    from Table1
), cte2 as (
    select
        c.name, c.period as s, c.period as e
    from cte1 as c
    where not exists (select * from cte1 as t where t.name = c.name and t.period = c.period - interval '1 month')

    union all

    select
        c.name, c.s as s, t.period
    from cte2 as c
        inner join cte1 as t on t.name = c.name and t.period = c.e + interval '1 month'

)   
select
    c.name, to_char(c.s, 'YYYY/MM') as "Start", to_char(max(c.e), 'YYYY/MM') as "End"
from cte2 as c
group by c.name, c.s
order by 1, 2

我不确定这个性能如何,你必须测试它。

sql fiddle demo

这是一个聚合问题,但有一点不同——您需要为每个名称定义相邻月份的组。

假设给定名称的月份不会出现超过一次,您可以通过为每个句点分配一个 "month" 数字并减去一个序号来实现。对于连续的月份,这些值将保持不变。

select name, min(period), max(period)
from (select t.*,
             (cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
              row_number() over (partition by name order by period)
             ) as grp
      from names t
     ) t
group by grp, name;

Here 是 SQL Fiddle 说明这一点。

注意:重复也不是真正的问题。您只需使用 dense_rank() 而不是 row_number().