SQL 中的连续转换

Question

我是一个 SQL 新手，所以对于我要问的问题可能有标准的方法，或者可能已经有人回答了。我不知道描述我所追求的操作类型的正确术语，所以我不知道要搜索什么。

所以，回到我的问题：我有一个 table，它对其他 table 中使用的 ID 列进行了转换。它确实包括连续的转换，可能有几个链接的这样的转换。所有行都有日期戳。这是一个玩具样品：

|---------|-------|------------|
| from_id | to_id |       date |
|---------|-------|------------|
|    1001 |  2001 | 2019-01-01 |
|    1002 |  2002 | 2019-01-01 |
|    1003 |  2003 | 2019-02-02 |
|    2001 |  3001 | 2019-03-03 |
|    2002 |  3002 | 2019-03-03 |
|    3001 |  4001 | 2019-04-04 |
|---------|-------|------------|

根据这些数据，我想创建两个 tables:

一个 table 将每个 from_id 链接到它的最后一个 to_id。对于我的玩具示例，我想要以下内容：

|---------|-------------|
| from_id | final_to_id |
|---------|-------------|
|    1001 |        4001 |
|    1002 |        3002 |
|    1003 |        2003 |
|    2001 |        4001 |
|    2002 |        3002 |
|    3001 |        4001 |
|---------|-------------|

我还想要一个 table，双向都链接 ID:s。对于我的玩具示例：

|------|------|
| id_1 | id_2 |
|------|------|
| 1001 | 2001 |
| 1001 | 3001 |
| 1001 | 4001 |
| 1002 | 2002 |
| 1002 | 3002 |
| 1003 | 2003 |
| 2001 | 1001 |
| 2001 | 3001 |
| 2001 | 4001 |
| 2002 | 1002 |
| 2002 | 3002 |
| 2003 | 1003 |
| 3001 | 1001 |
| 3001 | 2001 |
| 3001 | 4001 |
| 3002 | 1002 |
| 3002 | 2002 |
| 4001 | 1001 |
| 4001 | 2001 |
| 4001 | 3001 |
|------|------|

这两个结果当然可以合并为一个 table，其中结果 1 的相关行只是用一个标志突出显示。

那么，我该如何在 SQL 中执行此操作？任何帮助深表感谢。请注意，我不知道最长的转换链包含多少步，但我知道它每天都在增加。

Answer 1

为此，您需要一个递归 CTE。我认为最简单的方法是从最近的日期倒退：

with cte as (
      select t.to_id, t.from_id, t.from_id as terminal_id
      from toy t
      where not exists (select 1 from toy t2 where t2.from_id = t.to_id)
      union all
      select t.to_id, t.from_id, cte.terminal_id
      from cte join
           toy t
           on t.to_id = cte.from_id
     )
select *
from cte;

这会生成一半的行。对于另一半，你可以这样做：

select to_id, from_id, terminal_id
from cte
union all
select from_id, to_id, terminal_id
from cte;

Answer 2

正如 Gordon 所说，您需要一个递归 CTE，它体现了一种特殊的定点递归。我认为分两个阶段考虑这个最简单：

完成传递闭包；然后
用它来产生你想要的结果。

我们将在第一部分使用 CTE。 CTE 有两个子句，由 "union all" 分隔。第一个子句是运行 once to prime them pump;第二个运行s 重复，直到它不产生输出（或超过允许的迭代次数）。每次第二个子句运行s 时，都会发生两件事：

结果附加到 CTE 结果；和
结果替换 CTE 内CTE 的"working" 值。

考虑到这一点，这里有一个计算传递闭包查询的 CTE。它做出了一些重要的假设：

您的 ID 链中没有循环；
ID一直在增加；和
日期无关紧要。

代码如下：

with cte as (
  select from_id, to_id
  from t
  union all
  select t1.from_id, t2.to_id
  from cte t1 join t t2 on t1.to_id = t2.from_id
)
select * from cte;

第一个子句将生成您的原始 table（减去日期列）：

Round 1:
FROM_ID    TO_ID   
-------  -------  
   1001     2001  
   1002     2002  
   1003     2003  
   2001     3001  
   2002     3002  
   3001     4001

然后第二个子句将使用此结果作为工作 table 并将其与原始 table 合并。这将产生下一轮：

Round 2:
FROM_ID    TO_ID   
-------  ------- 
   1001     3001  
   1002     3002  
   2001     4001

这将附加到结果中，但会成为下一轮的工作 table。所以我们的第三轮给了我们：

Round 3:
FROM_ID    TO_ID   
-------  ------- 
   1001     4001

下一轮没有结果，这意味着任何一轮都不会产生任何新结果——CTE 已达到固定点。这是 CTE 终止并向我们提供最终结果的时间：

FROM_ID    TO_ID   
-------  -------  
   1001     2001  
   1002     2002  
   1003     2003  
   2001     3001  
   2002     3002  
   3001     4001  
   1001     3001  
   1002     3002  
   2001     4001  
   1001     4001

我们仍然需要执行第 2 步，但从这里开始它相对容易：您的结果只是 CTE 中的行集，每个 FROM_ID 具有最大 TO_ID。我们将向 CTE 添加一点 post 处理：

with cte as (
  select from_id, to_id
  from t
  union all
  select t1.from_id, t2.to_id
  from cte t1 join t t2 on t1.to_id = t2.from_id
)
select from_id, max(to_id) as to_id
from cte
group by from_id;

这给了我们：

FROM_ID    TO_ID   
-------  ------- 
   1001     4001
   1002     3002
   1003     2003
   2001     4001
   2002     3002
   3001     4001

好了。正如 Gordon 指出的那样，使用相同 CTE 的结果，另一个问题也应该很简单：

with cte as (
  select from_id, to_id
  from t
  union all
  select t1.from_id, t2.to_id
  from cte t1 join t t2 on t1.to_id = t2.from_id
)
select from_id id_1, to_id id_1
from cte
union all
select to_id id_1, from_id id_1
from cte;

SQL 中的连续转换

Consecutive transformations in SQL

sql

snowflake-cloud-data-platform