如何以递归方式创建系谱并查找匹配以进行近亲繁殖检测 (Oracle)

Question

我很难在 Oracle 中创建一个函数来查明 2 只交配的动物是否会产生近亲繁殖。函数应该有 3 个参数：男性 ID、女性 ID 和要查看的深度。

起初我认为我应该使用来自 table 的数据创建两个谱系，其结构如下：

    TABLE animal
    +-----+---------+--------+
    | ID  | SIRE_ID | DAM_ID |
    +-----+---------+--------+
    | 111 | 112     | 212    |
    | 112 | 113     | 213    |
    | 212 | 116     | 216    |
    +-----+---------+--------+

（不完全相关，但对于这个和以后的示例，我使用 ID-s 作为 1?? 是男性，2?? 是女性。）为此，我应该使用深度参数——可能是递归的。

这是我目前拥有的：

function animal_pedigree (p_id number,
    p_max_pedigree_level number,
    p_pedigree_level number := 0,
    p_position varchar2 := '') return animal_ancestors_table
    pipelined
is
    v_sire_id number;
    v_dam_id number;
        v_row animal_ancestor;
begin
        v_row.id := p_id;
        v_row.pedigree_level := p_pedigree_level;
        v_row.position := p_position;
    pipe row (v_row);
    if p_pedigree_level < p_max_pedigree_level then
        select sire_id, dam_id
        into v_sire_id, v_dam_id
        from arc.animal
        where id = p_id;
        if v_sire_id is not null then
            for rec in (select id, pedigree_level, position
                from table(animal_pedigree (v_sire_id, p_max_pedigree_level, p_pedigree_level+1, p_position || 's'))) loop
                                v_row.id := rec.id;
                                v_row.pedigree_level := rec.pedigree_level;
                                v_row.position := rec.position;
                pipe row (v_row);
            end loop;
        end if;
        if v_dam_id is not null then
            for rec in (select id, pedigree_level, position
                from table(animal_pedigree (v_dam_id, p_max_pedigree_level, p_pedigree_level+1, p_position || 'd'))) loop
                                v_row.id := rec.id;
                                v_row.pedigree_level := rec.pedigree_level;
                                v_row.position := rec.position;
                pipe row (v_row);
            end loop;
        end if;
    end if;
    return;
end;

之后是对我来说棘手的部分：比较谱系以找到匹配的 ID-s（并记住找到匹配的深度）。

最终我想 return 发现近亲繁殖的最小深度或当 none 被发现时为 0。

注意！我只想比较两个谱系，而不是比较其中一个的 ID。（如果近亲繁殖已经存在，我希望它被忽略，只对新形成的近亲繁殖感兴趣。）

为了进一步说明，我添加了 3 个示例。标有 *（星号）的匹配项。

示例 1：

男性血统

Depth           1       2       3

                            |--114
                    |--113--|
                    |       |--214
            |--112--|       
            |       |       |--115
            |       |--213--|
            |               |--215
       111--|
            |               |--117
            |       |--116--|
            |       |       |--217
            |--212--|
                    |       |--118
                    |--216--|
                            |--218

女性血统

Depth           1       2       3

                            |--124
                    |--123--|
                    |       |--224
            |--122--|       
            |       |       |--125
            |       |--223--|
            |               |--225
       211--|
            |               |--127
            |       |--126--|
            |       |       |--227
            |--222--|
                    |       |--128
                    |--226--|
                            |--228

[RETURN 0] 没有找到相同的 ID

示例 2：

男性血统

Depth           1       2       3

                            |--114*
                    |--113--|
                    |       |--214
            |--112--|       
            |       |       |--115
            |       |--213--|
            |               |--215
       111--|
            |               |--117
            |       |--116--|
            |       |       |--217
            |--212--|
                    |       |--114*
                    |--216--|
                            |--218

女性血统

Depth           1       2       3

                            |--124
                    |--123--|
                    |       |--224
            |--122--|       
            |       |       |--125
            |       |--223--|
            |               |--225
       211--|
            |               |--127
            |       |--126--|
            |       |       |--227
            |--222--|
                    |       |--128
                    |--226--|
                            |--228

[RETURN 0] 匹配的 ID 都在男性血统中找到。忽略。

示例 3：

男性血统

Depth           1       2       3

                            |--114*
                    |--113--|
                    |       |--214
            |--112--|       
            |       |       |--115
            |       |--213--|
            |               |--215
       111--|
            |               |--117
            |       |--116--|
            |       |       |--217
            |--212--|
                    |       |--118
                    |--216--|
                            |--218

女性血统

Depth           1       2       3

                            |--124
                    |--123--|
                    |       |--224
            |--122--|       
            |       |       |--125
            |       |--223--|
            |               |--225
       211--|
            |               |--127
            |       |--114*-|
            |       |       |--227
            |--222--|
                    |       |--128
                    |--226--|
                            |--228

[RETURN 2] 在男性系谱深度 3 和女性系谱深度 2 处找到的匹配 ID-s

Answer 1

您可以使用递归 CTE 来查找匹配的祖先。

此示例未经测试，因为您没有提供创建示例数据的脚本。无论如何，这个查询应该有效：

with
l (id, aid, sire_id, dam_id, lvl) as (
  select id, id, sire_id, dam_id, 0 from animal where id = 111 -- male ID
  union all
  select l.id, a.id, l.lvl + 1
  from l
  join animal a on a.id in (l.sire_id, l.dam_id)
),
r (id, aid, sire_id, dam_id, lvl) as (
  select id, id, sire_id, dam_id, 0 from animal where id = 211 -- female ID
  union all
  select r.id, a.id, r.lvl + 1
  from r
  join animal a on a.id in (r.sire_id, r.dam_id)
)
select 
  l.id as male_id, l.aid as male_ancestor_id, l.lvl as male_ancestor_depth,
  r.id as female_id, r.aid as female_ancestor_id, r.lvl as female_ancestor_depth
from l
join r on r.aid = l.aid

此查询 return 在所有组合中匹配（可以有多个）。您可以添加额外的更改以删除重复的匹配项，因为一种动物可以是每棵树上已有的多种动物的祖先。

此外，主查询显示匹配的所有详细信息，包括匹配的祖先及其对应的深度。您可以轻松修改它以仅显示深度（如您所愿）。或者...您可以展开它以向您显示 "full path" 以到达每个祖先。这取决于您想要的确切输出。我敢打赌，一旦您看到结果，您就会想要了解更多相关信息。

Answer 2

由于您使用的是 10g 而不是较新的版本，因此您需要使用 oracle 的分层查询而不是 The Impaler 所示的递归通用 table 表达式。为了使我的解决方案起作用，将动物性别编码为单独的列而不是将其嵌入动物 ID 中会很有帮助，因此我将使用以下 table 定义。（注意：我没有 10g 实例来尝试这个，所以我不确定 10g 中是否有可延迟约束。如果没有，就删除这些子句。它们使加载样本数据变得更容易。） :

CREATE TABLE animal
    ( ID number not null primary key
    , GENDER varchar2(1) not null
    , SIRE_ID number
    , DAM_ID number
    , constraint animal_gender check (gender in ('M','F'))
    , constraint animal_sire_fk FOREIGN KEY (sire_id) REFERENCES animal(id) DEFERRABLE INITIALLY DEFERRED
    , constraint animal_dam_fk FOREIGN KEY (dam_id) REFERENCES animal(id) DEFERRABLE INITIALLY DEFERRED
    );

从那里为任何给定动物生成所有祖先的映射很有帮助，这称为闭包 table 如果需要，您可以 google 更多关于闭包 tables .这是通过递归 SQL 或在我们的例子中使用 oracle 分层 tables 完成的，因为您使用的是 10g。对于这个例子，我将其命名为 Ancestry：

with ancestry as (
select CONNECT_BY_ROOT id id
     , CONNECT_BY_ROOT gender gender
     , id ancestor_id
     , gender ancestor_gender
     , level-1 lvl
  from animal
  connect by id in (prior sire_id, prior dam_id)
)

从那里您可以通过适度简单的连接找到所有具有共同祖先的动物：

select m.id sire_id
     , f.id dam_id
     , m.ancestor_id
     , m.ancestor_gender
     , m.lvl sire_lvl
     , f.lvl dam_lvl
  from ancestry m
  join ancestry f
    on m.ancestor_id = f.ancestor_id
   and m.gender = 'M'
   and f.gender = 'F';

该查询列出了所有成对的雄性和雌性动物及其所有共同祖先。它有点多，我们希望将其缩减为您感兴趣的配对，并将其限制为第一个共同祖先。为此，我们将添加一个 where 子句，将其限制为感兴趣的配对，并使用聚合将我们降到第一个祖先：

select m.id sire_id
     , f.id dam_id
     , max(m.ancestor_id) keep (dense_rank first order by least(m.lvl,f.lvl)) first_ancestor
     , max(m.ancestor_gender) keep (dense_rank first order by least(m.lvl,f.lvl)) ancestor_gnder
     , min(m.lvl) sire_lvl
     , min(f.lvl) dam_lvl
  from ancestry m
  join ancestry f
    on m.ancestor_id = f.ancestor_id
   and m.gender = 'M'
   and f.gender = 'F'
 where (m.id, f.id) in ((111,211))
 group by m.id, f.id;

将所有内容放在一起就是最终查询：

with ancestry as (
select CONNECT_BY_ROOT id id
     , CONNECT_BY_ROOT gender gender
     , id ancestor_id
     , gender ancestor_gender
     , level-1 lvl
  from animal
  connect by id in (prior sire_id, prior dam_id)
)
select m.id sire_id
     , f.id dam_id
     , max(m.ancestor_id) keep (dense_rank first order by least(m.lvl,f.lvl)) first_ancestor
     , max(m.ancestor_gender) keep (dense_rank first order by least(m.lvl,f.lvl)) ancestor_gnder
     , min(m.lvl) sire_lvl
     , min(f.lvl) dam_lvl
  from ancestry m
  join ancestry f
    on m.ancestor_id = f.ancestor_id
   and m.gender = 'M'
   and f.gender = 'F'
 where (m.id, f.id) in ((111,211))
 group by m.id, f.id;

并且您可以在 fiddle 示例中使用此 db<>fiddle 看到它的实际效果示例 2 和示例 3 与示例 1 不同，因此示例 2 仅添加了雄性系，示例 3 中仅添加了雌性系，产生了以下配对 (1111, 1121)、(2111, 1211) 和 ( 1111,3211) 分别用于示例 1、2 和 3。

这只会 return 记录这对动物有共同祖先的情况。它还会预生成整个祖先闭包，这对于大型邻接列表来说可能很耗时。为了更有效的查询，祖先闭包可以仅限于具有 START 条件的感兴趣的动物。此外，搜索深度可以限制在两个地方之一（或两者），在输出查询中的 where 子句中，或在祖先查询中的 where 子句中。此外，为了满足您的要求，即在没有共同祖先时配对 return 显示零级别的行，需要进行一些细微的修改。首先，祖先 CTE 需要修改为具有用于自连接的空级别（深度为零）。这对于使聚合工作很重要。然后聚合列和连接条件需要稍微更新以允许没有共同祖先的记录。这是修改后的查询：

with ancestry as (
select CONNECT_BY_ROOT id id
     , CONNECT_BY_ROOT gender gender
     , id ancestor_id
     , gender ancestor_gender
     , case level when 1 then null else level-1 end lvl
     , level-1 lvl0
  from animal

 -- Limit depth to 3 generations
 where level-1 <= 3 

 connect by id in (prior sire_id, prior dam_id)

 -- Only build ancestry closure for these animals
 start with id in (1111,1211,2111,3211)
)
select m.id sire_id
     , f.id dam_id
     , max(nvl2(m.lvl,m.ancestor_id,null)) keep (dense_rank first order by least(m.lvl,f.lvl) nulls last) first_ancestor
     , max(nvl2(m.lvl,m.ancestor_gender,null)) keep (dense_rank first order by least(m.lvl,f.lvl) nulls last) ancestor_gnder
     , nvl(min(m.lvl),0) sire_lvl
     , nvl(min(f.lvl),0) dam_lvl
  from ancestry m
  join ancestry f
    on (m.ancestor_id = f.ancestor_id or (m.id, f.id) in ((m.ancestor_id, f.ancestor_id)))
   and m.gender = 'M'
   and f.gender = 'F'
 where (m.id, f.id) in ((1111,1211) -- First example no common ancestors
                       ,(2111,1211) -- 2nd ex common ancesters in male line
                       ,(1111,3211))-- 3rd ex common ancestry of sire & dam

   -- Limit to at most 3 generations
   and greatest(m.lvl0, f.lvl0) <= 3

 group by m.id, f.id;

如何以递归方式创建系谱并查找匹配以进行近亲繁殖检测 (Oracle)

How to recursivly create pedigrees and find matches for inbreeding detection (Oracle)

sql

oracle

oracle10g