Oracle - 过滤笛卡尔坐标

Oracle - filtering Cartesian coordinate

我有一个 mating_history table:

id    cage_id   code    event_date                    animal_id
---------------------------------------------------------------
100   4163      FA      03-Aug-2016 10.51.55.000 AM   3570
101   4163      MA      03-Aug-2016 10.52.13.000 AM   2053
102   4163      MR      29-Aug-2016 10.23.24.000 AM   2053
103   4163      MA      11-Oct-2016 12.50.02.000 PM   5882
104   4163      MR      31-Oct-2016 01.37.28.000 PM   5882
105   4163      MA      07-Nov-2016 01.27.58.000 PM   5882
106   4163      FA      19-Apr-2017 11.46.50.000 AM   6011
107   4163      FA      19-Apr-2017 11.48.31.000 AM   6010

图例:

MA = Male added to cage
MR = Male removed from cage
FA = Female added to cage
FR = Female removed from cage

在上面的table中,第一行说在event_date上,一只雌性动物(id为3570)被添加到笼子里,目的是繁殖.


如果您关注历史日志,您将获得这些点数 "actual mating":

female_id    male_id    event_date
-----------------------------------------------------------------
3570         2053       03-Aug-2016 10.52.13.000 AM
3570         5882       11-Oct-2016 12.50.02.000 PM
3570         5882       07-Nov-2016 01.27.58.000 PM
6011         5882       19-Apr-2017 11.46.50.000 AM
6010         5882       19-Apr-2017 11.48.31.000 AM

然而,当我试图将我的想法转化为SQL时,我并没有得到上面我想要的。

SQL

SELECT
  be.cage_id, be.code AS base_code, be.animal_id AS base_animal, be.event_date AS base_date,
  se.code AS sub_code, se.animal_id AS sub_animal, se.event_date AS sub_date
FROM mating_history be
  LEFT JOIN mating_history se ON se.cage_id = be.cage_id
WHERE be.cage_id = 4163
  AND be.code != se.code
  AND be.code IN ('MA', 'FA')
  AND se.code IN ('MA', 'FA')
  AND be.event_date < se.event_date
ORDER BY be.event_date ASC, se.event_date ASC

结果

cage_id    base_code   base_animal    base_date                    sub_code    sub_animal     sub_date
--------------------------------------------------------------------------------------------------------------------
4163       FA          3570           03-Aug-2016 10.51.55.000 AM  MA          2053           03-Aug-2016 10.52.13.000 AM
4163       FA          3570           03-Aug-2016 10.51.55.000 AM  MA          5882           11-Oct-2016 12.50.02.000 PM
4163       FA          3570           03-Aug-2016 10.51.55.000 AM  MA          5882           07-Nov-2016 01.27.58.000 PM
4163       MA          2053           03-Aug-2016 10.52.13.000 AM  FA          6011           19-Apr-2017 11.46.50.000 AM --------> WRONG
4163       MA          2053           03-Aug-2016 10.52.13.000 AM  FA          6010           19-Apr-2017 11.48.31.000 AM --------> WRONG
4163       MA          5882           11-Oct-2016 12.50.02.000 PM  FA          6011           19-Apr-2017 11.46.50.000 AM --------> WRONG
4163       MA          5882           11-Oct-2016 12.50.02.000 PM  FA          6010           19-Apr-2017 11.48.31.000 AM --------> WRONG
4163       MA          5882           07-Nov-2016 01.27.58.000 PM  FA          6011           19-Apr-2017 11.46.50.000 AM
4163       MA          5882           07-Nov-2016 01.27.58.000 PM  FA          6010           19-Apr-2017 11.48.31.000 AM

我不知道如何获得我需要的 5 行。如何进一步过滤结果,以便在这种情况下只得到我需要的 5 行?

可选:创建笛卡尔积是否是我要实现的目标的最佳解决方案?有更好的方法吗?

让我们跟踪一下谁在笼子里。 . .并假设只有一男一女。以下为每次更改获取笼子中的动物:

select mh.*,
       (case when 'MA' = lag(case when base_code in ('MA', 'MR') then base_code end ignore nulls) over (partition by cage_id order by event_date)
             then lag(case when base_code in ('MA') then animal_id end ignore nulls) over (partition by cage_id order by event_date)
        end) as male_animal,
       (case when 'FA' = lag(case when base_code in ('FA', 'FR') then base_code end ignore nulls) over (partition by cage_id order by event_date)
             then lag(case when base_code in ('FA') then animal_id end ignore nulls) over (partition by cage_id order by event_date)
        end) as female_animal,
       lead(event_date) over (partition by cage_id order by event_date) as next_event_date
from mating_history mh;

你想要两种动物都存在的那些:

select mh.*
from (select mh.*,
             (case when 'MA' = lag(case when base_code in ('MA', 'MR') then base_code end ignore nulls) over (partition by cage_id order by event_date) = 'MA'
                   then lag(case when base_code in ('MA') then animal_id end ignore nulls) over (partition by cage_id order by event_date)
              end) as male_animal,
             (case when 'FA' = lag(case when base_code in ('FA', 'FR') then base_code end ignore nulls) over (partition by cage_id order by event_date) = 'FA'
                   then lag(case when base_code in ('FA') then animal_id end ignore nulls) over (partition by cage_id order by event_date)
              end) as female_animal,
             lead(event_date) over (partition by cage_id order by event_date) as next_event_date
      from mating_history mh
     ) mh
where male_animal is not null and female_animal is not null;

这可能有效:

设置:

create table mating_history (
      id         number    primary key
    , cage_id    number    not null
    , code       char(2)   check (code in ('FA', 'FR', 'MA', 'MR'))
    , event_date timestamp not null
    , animal_id  number    not null
);

insert into mating_history
  select 100, 4163, 'FA', timestamp '2016-08-03 10:51:55', 3570 from dual union all
  select 101, 4163, 'MA', timestamp '2016-08-03 10:52:13', 2053 from dual union all
  select 102, 4163, 'MR', timestamp '2016-08-29 10:23:24', 2053 from dual union all
  select 103, 4163, 'MA', timestamp '2016-10-11 12:50:02', 5882 from dual union all
  select 104, 4163, 'MR', timestamp '2016-10-31 13:37:28', 5882 from dual union all
  select 105, 4163, 'MA', timestamp '2016-11-07 13:27:58', 5882 from dual union all
  select 106, 4163, 'FA', timestamp '2017-04-19 11:46:50', 6011 from dual union all
  select 107, 4163, 'FA', timestamp '2017-04-19 11:48:31', 6010 from dual
;

commit;

这在几个方面都很糟糕。笼子和动物应该有小 "dimension" tables。动物 table 应该显示性别(而不是当前 table 中的 "code")。现在,我假设数据与您提供的一样,并且您不倾向于修复数据模型。

查询:

with
     grouped ( cage_id, sex, event_code, event_date, animal_id, grp ) as (
       select cage_id, substr(code, 1, 1), substr(code, 2), 
              event_date, animal_id,
              row_number() over (partition by animal_id, code order by event_date) 
       from   mating_history
     ),
     pivoted as (
       select *
       from   grouped
       pivot  ( max(event_date) for event_code in ('A' as a, 'R' as r) )
     )
select   f.animal_id as female_id,
         m.animal_id as male_id,
         greatest(f.a, m.a) as event_date
from     ( select * from pivoted where sex = 'F' ) f
         join
         ( select * from pivoted where sex = 'M' ) m
         on     f.cage_id = m.cage_id
            and ( f.r >= m.a or f.r is null )
            and ( m.r >= f.a or m.r is null )
order by event_date, female_id, male_id
;

输出:(event_date列使用我当前的NLS_TIMESTAMP_FORMAT

 FEMALE_ID   MALE_ID    EVENT_DATE                             
 ---------   -------    ------------------------------
      3570      2053    03-AUG-2016 10.52.13.000000000          
      3570      5882    11-OCT-2016 12.50.02.000000000          
      3570      5882    07-NOV-2016 13.27.58.000000000          
      6011      5882    19-APR-2017 11.46.50.000000000          
      6010      5882    19-APR-2017 11.48.31.000000000