使用递归 CTE 函数检查每一行与其他每一行

Using a Recursive CTE function to check every row against every other row

我正在尝试更新 table,以便我标记任何具有重复名称条目的条目。我做了一些处理以删除一些常见的前缀和后缀,然后可以 运行 两个名称与模糊匹配 CLR 相互比较。我把它写成一个嵌套游标,目前需要大约 4 个小时来 运行 遍历所有记录,因为我必须对照每一行检查每一行。我读过使用递归 CTE 可以显着提高性能,但是我是一个 SQL 菜鸟,不能完全让它工作。我想我需要将一个递归 CTE 嵌套到另一个递归 CTE 中,但不确定如何。

目前我有这样的东西:

;WITH AllOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
AS 
(
    SELECT C.CompanyId, C.CompanyRoleId, C.Name, C.Recognized, 1
    FROM Company O
    WHERE DuplicateOfCompanyId IS NULL
    UNION ALL
    SELECT C.CompanyId, C.CompanyRoleId, C.Name, R.Recognized, R.Level + 1
    FROM AllOrgs R INNER JOIN Company C
    ON C.CompanyId = R.CompanyId
), 
DuplicateOrgs (CompanyId, CompanyRoleId, Name, Recognized, Level)
As 
(
    SELECT * FROM AllOrgs
    WHERE Recognized = 0 -- Recognized is what the companies are marked when we are satisfied they aren't incorrect
)
UPDATE O
SET C.DuplicateOfCompanyId = A.CompanyId
FROM Company O JOIN DuplicateOrgs A
ON C.CompanyId = A.CompanyID
WHERE master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(A.Name), dbo.fnCleanUpCompanyName(C.Name)) 
    > @CompanyNameMatchValueThreshold
AND A.CompanyRoleID = C.CompanyRoleId -- Role ID must match as duplicates who provide a different function are fine

但每当我尝试 运行 时,我都会得到一个 "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." 所以我显然在做一些愚蠢的事情。

您的递归不会终止,因为您总是在新级别插入锚值本身。公司中只有一行的示例:

执行锚点后的AllOrgs:
CompanyId1, CompanyRoleId1, name1, Recognized1, 1

递归 1 后的 AllOrgs:
CompanyId1, CompanyRoleId1, name1, Recognized1, 1
CompanyId1、CompanyRoleId1、name1、Recognized1、2

递归 2 后的 AllOrgs:
CompanyId1, CompanyRoleId1, name1, Recognized1, 1
CompanyId1, CompanyRoleId1, name1, Recognized1, 2
CompanyId1, CompanyRoleId1, name1, Recognized1, 3

...

改为尝试自连接:

UPDATE C 
SET DuplicateOfCompanyId = Dup.CompanyId
FROM Company C
JOIN Company Dup ON C.CompanyId <> Dup.CompanyID 
    AND master.dbo.fnClrFuzzyMatch(dbo.fnCleanUpCompanyName(C.Name), dbo.fnCleanUpCompanyName(Dup.Name)) > @CompanyNameMatchValueThreshold
    AND C.CompanyRoleID = Dup.CompanyRoleId

注意:如果一家公司有多个重复项,则 duplicateOfCompanyId 可能是任意的且不一致。