仅缺口月和孤岛月

Question

我正在处理客户及其日期记录的数据集。我正在尝试仅使用 MONTHS（当前为 var char 'YYYYMM'）来应用间隙和孤岛问题。我需要记录个人记录并按间隔分组（无论年份如何）。我不知道如何去：

drop table RUNNING_LOG;

create table running_log ( 
  run_date          date not null,  
  time_in_seconds   int  not null, 
  distance_in_miles int  not null,
  SERV_YRMO         VARCHAR2(6)
);

insert into running_log values (date'2018-01-01', 420, 1,'201801');  
insert into running_log values (date'2018-01-02', 2400, 5,'201801');  
insert into running_log values (date'2018-01-03', 2430, 5,'201801');  
insert into running_log values (date'2018-01-06', 2350, 5,'201801');  
insert into running_log values (date'2018-02-07', 410, 1,'201802');  
insert into running_log values (date'2018-02-10', 400, 1,'201802');  
insert into running_log values (date'2018-02-13', 2300, 5,'201802');  
insert into running_log values (date'2018-12-31', 425, 1,'201803');  
insert into running_log values (date'2019-01-01', 422, 1,'201901');  
insert into running_log values (date'2019-01-06', 2350, 5,'201901');  
insert into running_log values (date'2019-02-07', 410, 1,'201902');  
insert into running_log values (date'2019-02-10', 400, 1,'201902');  
insert into running_log values (date'2019-02-13', 2300, 5,'201902');  
insert into running_log values (date'2019-03-14', 425, 1,'201903');  
insert into running_log values (date'2019-03-15', 422, 1,'201903');
insert into running_log values (date'2020-03-01', 425, 1,'202003');  
insert into running_log values (date'2021-03-31', 422, 1,'202103');   

commit;  

select * from running_log;

收件人：

Answer 1

一种方法是使用 dense_rank() 并将日期截断为月份：

select to_char(min(run_date), 'YYYY-MM'), to_char(max(run_date), 'YYYY-MM'), sum(distance)
from (select t.*,
             dense_rank() over (order by trunc(run_date, 'Month')) as seqnum
      from t
     ) t
group by trunc(run_date, 'Month') - seqnum * interval '1' month
order by min(run_date);

Answer 2

下面的解决方案使用 tabibitosan 方法创建组。如果您不熟悉这个概念，google - 您会发现很多关于它的好文章。（有时也称为 "fixed differences" 方法。）该方法的核心是在子查询中创建组； select 子查询和运行它本身，没有外部查询，看看它做了什么。特别注意子查询中的 GRP 列；如果您问自己它是如何做到的，那就是您需要阅读该方法的地方。

正如我在您的问题下的评论中所解释的那样，不需要 SERV_YRMO 列（如果它是根据 RUN_DATE 值计算得出的），实际上您的 INSERT该列中的语句有错误。下面的解决方案只使用 RUN_DATE - 你也可以删除 SERV_YRMO 列，这只会造成麻烦。

另请注意，正如我在您问题下的另一条评论中指出的那样，您的算术似乎是错误的。由于这个原因，我的输出与你的不同。

select to_char(min(run_date), 'yyyymm') as min_yrmo,
       to_char(max(run_date), 'yyyymm') as max_yrmo,
       sum(distance_in_miles)           as total_distance
from   (
         select rl.*, 
                add_months( trunc(run_date, 'mm'), 
                            -dense_rank() over (order by trunc(run_date, 'mm'))
                          ) as grp
         from   running_log rl
       )
group  by grp
order  by min_yrmo
;

MIN_YRMO MAX_YRMO TOTAL_DISTANCE
-------- -------- --------------
201801   201802               23
201812   201903               16
202003   202003                1
202103   202103                1

编辑

OP的版本是某些描述的11。不过，对于可能有相同问题且拥有 Oracle 12.1 或更高版本的读者，MATCH_RECOGNIZE 可用于更有效的解决方案。它看起来像这样：

select *
from   running_log
match_recognize(
  order    by run_date
  measures to_char(first(run_date), 'yyyymm') as min_yrmo,
           to_char(last (run_date), 'yyyymm') as max_yrmo,
           sum(distance_in_miles)             as total_distance
  pattern  ( a b* )
  define   b as run_date < add_months(trunc(prev(run_date), 'mm'), 2)
)
;

Answer 3

检查下方 SQL，其中使用 tabibitosan 方法查找间隙。结果将 count=1 视为湖泊 count >1 作为 Island

select min(run_date), MAX(run_date), count(grp), decode (count(grp),1,'LAKE','ISLAND')
from (
select run_date, run_date - rownum as grp 
from omc.running_log 
order by RUN_DATE
)
group by grp ;

仅缺口月和孤岛月

Gaps and Islands Months only

sql

oracle

oracle11g

gaps-and-islands