如何在 SQL 中为具有多重关系的相关地址创建新标识符?

How to to create a new identifier in SQL for related addresses that have multiple relationships?

我有一个如下所示的数据集,其中包括地址和 customer_id。在这个例子中,多个客户可以运送到同一个地址,一个客户可以运送到多个地址。我想利用它们之间的关系创建一个新 ID,将所有相关地址和 customer_id 连接起来并使用新标识符。

原版Table

+-------+-----+
|address| id  |
+-------+-----+
| 11 rd | aa  |
+-------+-----+
| 11 rd | ab  |
+-------+-----+
| 21 dr | ac  |
+-------+-----+
| 21 dr | ab  |
+-------+-----+
| 31 rd | ad  |
+-------+-----+
| 21 dr | abb |
+-------+-----+
| 41 dr | abb |
+-------+-----+

期望输出Table

+-------+-----+--------+
|address| id  | new_id |
+-------+-----+--------+
| 11 rd | aa  | 1      |
+-------+-----+--------+
| 11 rd | ab  | 1      |
+-------+-----+--------+
| 21 dr | ac  | 1      |
+-------+-----+--------+
| 21 dr | ab  | 1      |
+-------+-----+--------+
| 31 rd | ad  | 2      |
+-------+-----+--------+
| 21 dr | abb | 1      |
+-------+-----+--------+
| 41 dr | abb | 1      |
+-------+-----+--------+

挑战在于相关关联的数量可能是无限的,而我下面的 SQL 仅适用于两个连接。

这是我的 SQL,它适用于这个小数据集,但当扩展到更多关系时,它需要更多连接。任何有关构建此结构的正确方法的想法都将不胜感激!

CREATE TABLE #temp (address char(40), id char(40))
INSERT INTO #temp VALUES ('11 rd', 'aa');
INSERT INTO #temp VALUES ('11 rd', 'ab');
INSERT INTO #temp VALUES ('21 dr', 'ac');
INSERT INTO #temp VALUES ('21 dr', 'ab');
INSERT INTO #temp VALUES ('31 rd', 'ad');
INSERT INTO #temp VALUES ('21 dr', 'abb');
INSERT INTO #temp VALUES ('41 dr', 'abb');

SELECT 
*
,DENSE_RANK() OVER(PARTITION BY id ORDER BY address ASC)as address_rank
,DENSE_RANK() OVER(PARTITION BY address ORDER BY id ASC)as id_rank
INTO #temp2 
FROM #temp 

SELECT a.address,a.id_rank,a.id,b.address as combined_address
INTO #temp3
FROM #temp2 a
LEFT JOIN #temp2 b ON a.id=b.id AND b.address_rank = 1

SELECT 
 a.address
,a.id
,DENSE_RANK() OVER(ORDER BY b.combined_address ASC)as new_id
FROM #temp3 a
LEFT JOIN #temp3 b ON a.combined_address=b.address and b.id_rank = 1

这是一个 graph-walking 问题。我将从稍微改写数据开始。它有助于每一行都有一个唯一的标识符,适当地称为 id。因此,我将 id 重命名为 name

然后,构造边并使用递归 CTE 遍历图形。在 SQL 服务器中,这看起来像:

with edges as (
      select distinct t1.id as id1, t2.id as id2
      from temp t1 join
           temp t2
           on (t1.address = t2.address or t1.name = t2.name) and (t1.id <> t2.id)
     ),
     cte as (
      select id, id as next_id, convert(varchar(max), concat(',', id, ',')) as visited, 1 as lev
      from temp
      union all
      select cte.id, e.id2, concat(visited, e.id2, ','), lev + 1
      from cte join
           edges e
           on cte.next_id = e.id1
      where visited not like concat('%,', e.id2, ',%') and lev < 10
     )
select id, min(next_id)
from cte
group by id;

Here 是一个 db<>fiddle.