递归 SQL - 计算层次结构中的后代数

Recursive SQL - count number of descendants in hierarchical structure

考虑一个包含以下列的数据库 table:

数据库表示来自 Math Genealogy Project 的数据,其中每个数学家通常有一个顾问,但也有两个顾问的情况。

使事情更清晰的视觉辅助:

如何计算每位数学家的后代数量?

我可能应该使用通用 Table 表达式(WITH RECURSIVE),但我现在几乎卡住了。我发现的所有类似示例都处理只有一个父级而不是两个的层次结构。

更新:

我修改了 Vladimir Baranov 提供的 SQL 服务器的解决方案,使其也适用于 PostgreSQL:

WITH RECURSIVE cte AS (
    SELECT m.id as start_id,
        m.id,
        m.name,
        m.advisor1,
        m.advisor2,
        1 AS level
    FROM public.mathematicians AS m

    UNION ALL

    SELECT cte.start_id,
        m.id,
        m.name,
        m.advisor1,
        m.advisor2,
        cte.level + 1 AS level
    FROM public.mathematicians AS m
    INNER JOIN cte ON cte.id = m.advisor1
               OR cte.id = m.advisor2
),
cte_distinct AS (
    SELECT DISTINCT start_id, id 
    FROM cte
)
SELECT cte_distinct.start_id,
    m.name,
    COUNT(*)-1 AS descendants_count
FROM cte_distinct
INNER JOIN public.mathematicians AS m ON m.id = cte_distinct.start_id
GROUP BY cte_distinct.start_id, m.name
ORDER BY cte_distinct.start_id

你没说你用的是什么DBMS。我将在此示例中使用 SQL 服务器,但它也适用于支持递归查询的其他数据库。

示例数据

我只输入了你树的右边部分,从欧拉开始。

最有趣的部分是拉格朗日和狄利克雷之间的多条路径。

DECLARE @T TABLE (ID int, name nvarchar(50), Advisor1ID int, Advisor2ID int);

INSERT INTO @T (ID, name, Advisor1ID, Advisor2ID) VALUES
(1,  'Euler', NULL, NULL),
(2,  'Lagrange', 1, NULL),
(3,  'Laplace', NULL, NULL),
(4,  'Fourier', 2, NULL),
(5,  'Poisson', 2, 3),
(6,  'Dirichlet', 4, 5),
(7,  'Lipschitz', 6, NULL),
(8,  'Klein', NULL, 7),
(9,  'Lindemann', 8, NULL),
(10, 'Furtwangler', 8, NULL),
(11, 'Hilbert', 9, NULL),
(12, 'Taussky-Todd', 10, NULL);

这是它的样子:

SELECT * FROM @T;

+----+--------------+------------+------------+
| ID |     name     | Advisor1ID | Advisor2ID |
+----+--------------+------------+------------+
|  1 | Euler        | NULL       | NULL       |
|  2 | Lagrange     | 1          | NULL       |
|  3 | Laplace      | NULL       | NULL       |
|  4 | Fourier      | 2          | NULL       |
|  5 | Poisson      | 2          | 3          |
|  6 | Dirichlet    | 4          | 5          |
|  7 | Lipschitz    | 6          | NULL       |
|  8 | Klein        | NULL       | 7          |
|  9 | Lindemann    | 8          | NULL       |
| 10 | Furtwangler  | 8          | NULL       |
| 11 | Hilbert      | 9          | NULL       |
| 12 | Taussky-Todd | 10         | NULL       |
+----+--------------+------------+------------+

查询

这是一个经典的递归查询,有两个有趣的地方。

1) CTE 的递归部分使用 Advisor1IDAdvisor2ID 连接到锚点部分:

        INNER JOIN CTE 
            ON CTE.ID = T.Advisor1ID
            OR CTE.ID = T.Advisor2ID

2) 由于后代可能有多条路径,递归查询可能会多次输出该节点。为了消除这些重复项,我在 CTE_Distinct 中使用了 DISTINCT。或许可以更高效地解决。

为了更好地理解查询的工作原理运行 每个 CTE 分别检查中间结果。

WITH
CTE
AS
(
    SELECT
        T.ID AS StartID
        ,T.ID
        ,T.name
        ,T.Advisor1ID
        ,T.Advisor2ID
        ,1 AS Lvl
    FROM @T AS T

    UNION ALL

    SELECT
        CTE.StartID
        ,T.ID
        ,T.name
        ,T.Advisor1ID
        ,T.Advisor2ID
        ,CTE.Lvl + 1 AS Lvl
    FROM
        @T AS T
        INNER JOIN CTE 
            ON CTE.ID = T.Advisor1ID
            OR CTE.ID = T.Advisor2ID
)
,CTE_Distinct
AS
(
    SELECT DISTINCT
        StartID
        ,ID
    FROM CTE
)
SELECT
    CTE_Distinct.StartID
    ,T.name
    ,COUNT(*) AS DescendantCount
FROM
    CTE_Distinct
    INNER JOIN @T AS T ON T.ID = CTE_Distinct.StartID
GROUP BY
    CTE_Distinct.StartID
    ,T.name
ORDER BY CTE_Distinct.StartID;

结果

+---------+--------------+-----------------+
| StartID |     name     | DescendantCount |
+---------+--------------+-----------------+
|       1 | Euler        |              11 |
|       2 | Lagrange     |              10 |
|       3 | Laplace      |               9 |
|       4 | Fourier      |               8 |
|       5 | Poisson      |               8 |
|       6 | Dirichlet    |               7 |
|       7 | Lipschitz    |               6 |
|       8 | Klein        |               5 |
|       9 | Lindemann    |               2 |
|      10 | Furtwangler  |               2 |
|      11 | Hilbert      |               1 |
|      12 | Taussky-Todd |               1 |
+---------+--------------+-----------------+

此处DescendantCount将节点本身算作后代。如果您想看到叶节点为 0 而不是 1,则可以从此结果中减去 1。

这里是SQL Fiddle.