使用 sql 识别具有特定特征的时期

identifying a period with particular characteristic using sql

我想写一个 SQL 查询来确定一个人没有吃肉的最长时间。理想情况下,输出看起来像

person  periodstart  periodend 

确定每个人最长不吃肉的时间,并且

periodstart would be the time of the first non-meat meal

periodend would be the time of the first meat meal following.

下面的

SQL 创建 table 和数据 .

CREATE TABLE MEALS 
(
  PERSON VARCHAR2(20 BYTE) 
, MEALTIME DATE 
, FOODTYPE VARCHAR2(20) 
);

Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('04-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('05-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 06:01:24','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 12:03:50','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('02-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('03-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('04-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 07:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('02-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('03-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 04:04:25','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('07-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'veg');

commit;

解决方案在SQL SERVER 中希望您能轻松理解

with x as (
 select ROW_NUMBER()over( Partition by person order by MealTime) rowId,* from #MEALS
)
,y as (
select ROW_NUMBER() over( Partition by person order by MealTime) rowID, * from 
#MEALS where FOODTYPE='meat')
select x.PERSON,x.MEALTIME startdate,y.MEALTIME endDate,        datediff(second,x.MEALTIME,y.MEALTIME) diff from x 
left join 
y on x.PERSON=y.PERSON where 
x.rowId=1 and y.rowID=1

这是一个缺口和孤岛问题,有多种方法可以解决它。一种是使用 an analytic function effect/trick 来查找每种类型的连续周期链:

select person, mealtime, foodtype,
  case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
  dense_rank() over (partition by person,
      case when foodtype = 'meat' then 1 else 0 end order by mealtime)
    - dense_rank() over (partition by person order by mealtime) as chain
from meals
order by person, mealtime;

'chain' 伪列基于此处的 case,因为您希望对水果和蔬菜 - 或任何非肉类 - 进行相同处理。

然后您可以将其用作内部查询,从每个链中的第一餐开始查找每个肉类和非肉类时段的开始:

select person, meat, min(mealtime) as first_meal
from (
  select person, mealtime, foodtype,
    case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
    dense_rank() over (partition by person,
        case when foodtype = 'meat' then 1 else 0 end order by mealtime)
      - dense_rank() over (partition by person order by mealtime) as chain
  from meals
)
group by person, meat, chain
order by person, min(mealtime);

PERSON               MEAT FIRST_MEAL       
-------------------- ---- ------------------
Jane                 No   04-JAN-15 06:09:09 
Jane                 Yes  07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50 
John                 No   02-JAN-15 10:03:23 
...

您希望这段时间涵盖第一顿非肉类餐到下一顿肉餐,因此您可以使用 that 作为内部查询,其中包含提前和滞后以查看两边的行:在蔬菜时段,您可以向前看,看看下一个肉类时段的开始;对于肉类时段,您回头看看他是上一个蔬菜时段的开始:

select person, meat,
  case when meat = 'Yes' then lag(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_start,
  case when meat = 'No' then lead(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_end
from (
  select person, meat, min(mealtime) as first_meal
  from (
    select person, mealtime, foodtype,
      case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
      dense_rank() over (partition by person,
          case when foodtype = 'meat' then 1 else 0 end order by mealtime)
        - dense_rank() over (partition by person order by mealtime) as chain
    from meals
  )
  group by person, meat, chain
)
order by person, period_start;

PERSON               MEAT PERIOD_START       PERIOD_END       
-------------------- ---- ------------------ ------------------
Jane                 No   04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 Yes  04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50                    
John                 No   02-JAN-15 10:03:23 03-JAN-15 10:03:23 
...

虽然我在此处保留了 'meat' 标志以使其更清晰一些,但它有效地为您提供了重复项。假设您想忽略最新的开放式时期,您只需跳过这些时期并消除重复项:

select person, period_start, period_end
from (
  select person, meat,
    case when meat = 'Yes' then lag(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_start,
    case when meat = 'No' then lead(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_end
  from (
    select person, meat, min(mealtime) as first_meal
    from (
      select person, mealtime, foodtype,
        case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
        dense_rank() over (partition by person,
            case when foodtype = 'meat' then 1 else 0 end order by mealtime)
          - dense_rank() over (partition by person order by mealtime) as chain
      from meals
    )
    group by person, meat, chain
  )
)
where meat = 'No'
and period_start is not null
and period_end is not null
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 02-JAN-15 10:03:23 03-JAN-15 10:03:23 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 
Mary                 05-JAN-15 04:04:25 05-JAN-15 06:04:25 

SQL Fiddle 中间步骤完整。

后来才意识到你只想要每个人的最长周期,你可以通过另一层获得:

select person, period_start, period_end
from (
  select person, period_start, period_end,
    rank() over (partition by person order by period_end - period_start desc) as rnk
  from (
    ...
  )
  where meat = 'No'
  and period_start is not null
  and period_end is not null
)
where rnk = 1
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 

Updated SQL Fiddle.