如何在 SQL 中为具有多重关系的相关地址创建新标识符?
How to to create a new identifier in SQL for related addresses that have multiple relationships?
我有一个如下所示的数据集,其中包括地址和 customer_id。在这个例子中,多个客户可以运送到同一个地址,一个客户可以运送到多个地址。我想利用它们之间的关系创建一个新 ID,将所有相关地址和 customer_id 连接起来并使用新标识符。
原版Table
+-------+-----+
|address| id |
+-------+-----+
| 11 rd | aa |
+-------+-----+
| 11 rd | ab |
+-------+-----+
| 21 dr | ac |
+-------+-----+
| 21 dr | ab |
+-------+-----+
| 31 rd | ad |
+-------+-----+
| 21 dr | abb |
+-------+-----+
| 41 dr | abb |
+-------+-----+
期望输出Table
+-------+-----+--------+
|address| id | new_id |
+-------+-----+--------+
| 11 rd | aa | 1 |
+-------+-----+--------+
| 11 rd | ab | 1 |
+-------+-----+--------+
| 21 dr | ac | 1 |
+-------+-----+--------+
| 21 dr | ab | 1 |
+-------+-----+--------+
| 31 rd | ad | 2 |
+-------+-----+--------+
| 21 dr | abb | 1 |
+-------+-----+--------+
| 41 dr | abb | 1 |
+-------+-----+--------+
挑战在于相关关联的数量可能是无限的,而我下面的 SQL 仅适用于两个连接。
这是我的 SQL,它适用于这个小数据集,但当扩展到更多关系时,它需要更多连接。任何有关构建此结构的正确方法的想法都将不胜感激!
CREATE TABLE #temp (address char(40), id char(40))
INSERT INTO #temp VALUES ('11 rd', 'aa');
INSERT INTO #temp VALUES ('11 rd', 'ab');
INSERT INTO #temp VALUES ('21 dr', 'ac');
INSERT INTO #temp VALUES ('21 dr', 'ab');
INSERT INTO #temp VALUES ('31 rd', 'ad');
INSERT INTO #temp VALUES ('21 dr', 'abb');
INSERT INTO #temp VALUES ('41 dr', 'abb');
SELECT
*
,DENSE_RANK() OVER(PARTITION BY id ORDER BY address ASC)as address_rank
,DENSE_RANK() OVER(PARTITION BY address ORDER BY id ASC)as id_rank
INTO #temp2
FROM #temp
SELECT a.address,a.id_rank,a.id,b.address as combined_address
INTO #temp3
FROM #temp2 a
LEFT JOIN #temp2 b ON a.id=b.id AND b.address_rank = 1
SELECT
a.address
,a.id
,DENSE_RANK() OVER(ORDER BY b.combined_address ASC)as new_id
FROM #temp3 a
LEFT JOIN #temp3 b ON a.combined_address=b.address and b.id_rank = 1
这是一个 graph-walking 问题。我将从稍微改写数据开始。它有助于每一行都有一个唯一的标识符,适当地称为 id
。因此,我将 id
重命名为 name
。
然后,构造边并使用递归 CTE 遍历图形。在 SQL 服务器中,这看起来像:
with edges as (
select distinct t1.id as id1, t2.id as id2
from temp t1 join
temp t2
on (t1.address = t2.address or t1.name = t2.name) and (t1.id <> t2.id)
),
cte as (
select id, id as next_id, convert(varchar(max), concat(',', id, ',')) as visited, 1 as lev
from temp
union all
select cte.id, e.id2, concat(visited, e.id2, ','), lev + 1
from cte join
edges e
on cte.next_id = e.id1
where visited not like concat('%,', e.id2, ',%') and lev < 10
)
select id, min(next_id)
from cte
group by id;
Here 是一个 db<>fiddle.
我有一个如下所示的数据集,其中包括地址和 customer_id。在这个例子中,多个客户可以运送到同一个地址,一个客户可以运送到多个地址。我想利用它们之间的关系创建一个新 ID,将所有相关地址和 customer_id 连接起来并使用新标识符。
原版Table
+-------+-----+
|address| id |
+-------+-----+
| 11 rd | aa |
+-------+-----+
| 11 rd | ab |
+-------+-----+
| 21 dr | ac |
+-------+-----+
| 21 dr | ab |
+-------+-----+
| 31 rd | ad |
+-------+-----+
| 21 dr | abb |
+-------+-----+
| 41 dr | abb |
+-------+-----+
期望输出Table
+-------+-----+--------+
|address| id | new_id |
+-------+-----+--------+
| 11 rd | aa | 1 |
+-------+-----+--------+
| 11 rd | ab | 1 |
+-------+-----+--------+
| 21 dr | ac | 1 |
+-------+-----+--------+
| 21 dr | ab | 1 |
+-------+-----+--------+
| 31 rd | ad | 2 |
+-------+-----+--------+
| 21 dr | abb | 1 |
+-------+-----+--------+
| 41 dr | abb | 1 |
+-------+-----+--------+
挑战在于相关关联的数量可能是无限的,而我下面的 SQL 仅适用于两个连接。
这是我的 SQL,它适用于这个小数据集,但当扩展到更多关系时,它需要更多连接。任何有关构建此结构的正确方法的想法都将不胜感激!
CREATE TABLE #temp (address char(40), id char(40))
INSERT INTO #temp VALUES ('11 rd', 'aa');
INSERT INTO #temp VALUES ('11 rd', 'ab');
INSERT INTO #temp VALUES ('21 dr', 'ac');
INSERT INTO #temp VALUES ('21 dr', 'ab');
INSERT INTO #temp VALUES ('31 rd', 'ad');
INSERT INTO #temp VALUES ('21 dr', 'abb');
INSERT INTO #temp VALUES ('41 dr', 'abb');
SELECT
*
,DENSE_RANK() OVER(PARTITION BY id ORDER BY address ASC)as address_rank
,DENSE_RANK() OVER(PARTITION BY address ORDER BY id ASC)as id_rank
INTO #temp2
FROM #temp
SELECT a.address,a.id_rank,a.id,b.address as combined_address
INTO #temp3
FROM #temp2 a
LEFT JOIN #temp2 b ON a.id=b.id AND b.address_rank = 1
SELECT
a.address
,a.id
,DENSE_RANK() OVER(ORDER BY b.combined_address ASC)as new_id
FROM #temp3 a
LEFT JOIN #temp3 b ON a.combined_address=b.address and b.id_rank = 1
这是一个 graph-walking 问题。我将从稍微改写数据开始。它有助于每一行都有一个唯一的标识符,适当地称为 id
。因此,我将 id
重命名为 name
。
然后,构造边并使用递归 CTE 遍历图形。在 SQL 服务器中,这看起来像:
with edges as (
select distinct t1.id as id1, t2.id as id2
from temp t1 join
temp t2
on (t1.address = t2.address or t1.name = t2.name) and (t1.id <> t2.id)
),
cte as (
select id, id as next_id, convert(varchar(max), concat(',', id, ',')) as visited, 1 as lev
from temp
union all
select cte.id, e.id2, concat(visited, e.id2, ','), lev + 1
from cte join
edges e
on cte.next_id = e.id1
where visited not like concat('%,', e.id2, ',%') and lev < 10
)
select id, min(next_id)
from cte
group by id;
Here 是一个 db<>fiddle.