SQL 查找具有不同 ID 的重复名称
SQL Finding Dupe Names with Distinct ID
我有一个客户列表,其中有一堆是重复的('Acme Inc'、'Acme, Inc'、'Acme Inc.'、'Acme, Inc.')他们都有不同的 ID。
但是,每个 ID 也有多个地址。像...
+-------+---------------+-------------------+-----------+---+-------+
|ID |Name |Address |City |St |Zip |
+-------+---------------+-------------------+-----------+---+-------+
|001 |Acme Inc |123 Address St |Columbus |OH |43081 |
|001 |Acme Inc |321 Street St |Columbus |OH |43081 |
|001 |Acme Inc |456 Blanket Blvd |Columbus |OH |43081 |
|002 |Acme, Inc |123 Babel St |Columbus |OH |43081 |
|002 |Acme, Inc |321 Acorn Rd |Columbus |OH |43081 |
|002 |Acme, Inc |456 Lancer Blvd |Columbus |OH |43081 |
|003 |Baker |456 Blanket Blvd |Columbus |OH |43081 |
|004 |Peterson |456 Blanket Blvd |Columbus |OH |43081 |
|005 |Plumbers Inc |123 Address St |Columbus |OH |43081 |
|006 |Plumbers, LLC |321 Street St |Columbus |OH |43081 |
|007 |Acme, Inc. |123 Address St |Columbus |OH |43081 |
我有一个规范化名称的函数,所以前 6 个都是 'Acme',最后两个 'Plumbers'。
我要的是ID和Name重复的列表。目标是报告具有唯一 ID 和重复名称的记录。
+-------+---------------+
|ID |Name |
+-------+---------------+
|001 |Acme Inc |
|002 |Acme, Inc |
|007 |Acme, Inc. |
|005 |Plumbers Inc |
|006 |Plumbers, LLC |
我试过这个:
SELECT
DISTINCT [Name],
( SELECT strNew FROM [fn_strNorm](2, [Name]) ) AS [NewName]
FROM [Processed_Vendors]
WHERE
[VendorID] <> '' AND
[VendorID] IS NOT NULL AND
[Name]<> '' AND
[Name] IS NOT NULL
GROUP BY [NewName]
HAVING COUNT(*) > 1
ORDER BY [NewName]
我也试过将它们放入[dump_names] table中并加入两者,但我总是从同一个ID中获取多条记录
SELECT
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
ORDER BY pv.[Name]
SELECT
'Name Match' AS [Reason],
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
AND ( SELECT strNew FROM [dbo].[fn_strNorm](2, pv.[Name]) ) = n.[NewName]
ORDER BY pv.[Name]
我想我想得太多了,或者我正在运动的偏头痛使我的思想蒙上了阴影。
无论哪种方式,我都感谢您的帮助。
一种方法是使用 CTE 来查找每个规范化名称的名称变体的不同计数。然后,加入您当前的 table 并仅保留具有多个姓名变体的记录。
WITH cte AS (
SELECT [dbo].[fn_strNorm](2, Name) AS NmName, COUNT(DISTINCT Name) AS cnt
FROM Processed_Vendors
GROUP BY [dbo].[fn_strNorm](2, Name)
)
SELECT DISTINCT pv.ID, pv.Name
FROM Processed_Vendors pv
INNER JOIN cte t
ON t.NmName = [dbo].[fn_strNorm](2, pv.Name)
WHERE t.cnt > 1;
我有一个客户列表,其中有一堆是重复的('Acme Inc'、'Acme, Inc'、'Acme Inc.'、'Acme, Inc.')他们都有不同的 ID。 但是,每个 ID 也有多个地址。像...
+-------+---------------+-------------------+-----------+---+-------+
|ID |Name |Address |City |St |Zip |
+-------+---------------+-------------------+-----------+---+-------+
|001 |Acme Inc |123 Address St |Columbus |OH |43081 |
|001 |Acme Inc |321 Street St |Columbus |OH |43081 |
|001 |Acme Inc |456 Blanket Blvd |Columbus |OH |43081 |
|002 |Acme, Inc |123 Babel St |Columbus |OH |43081 |
|002 |Acme, Inc |321 Acorn Rd |Columbus |OH |43081 |
|002 |Acme, Inc |456 Lancer Blvd |Columbus |OH |43081 |
|003 |Baker |456 Blanket Blvd |Columbus |OH |43081 |
|004 |Peterson |456 Blanket Blvd |Columbus |OH |43081 |
|005 |Plumbers Inc |123 Address St |Columbus |OH |43081 |
|006 |Plumbers, LLC |321 Street St |Columbus |OH |43081 |
|007 |Acme, Inc. |123 Address St |Columbus |OH |43081 |
我有一个规范化名称的函数,所以前 6 个都是 'Acme',最后两个 'Plumbers'。
我要的是ID和Name重复的列表。目标是报告具有唯一 ID 和重复名称的记录。
+-------+---------------+
|ID |Name |
+-------+---------------+
|001 |Acme Inc |
|002 |Acme, Inc |
|007 |Acme, Inc. |
|005 |Plumbers Inc |
|006 |Plumbers, LLC |
我试过这个:
SELECT
DISTINCT [Name],
( SELECT strNew FROM [fn_strNorm](2, [Name]) ) AS [NewName]
FROM [Processed_Vendors]
WHERE
[VendorID] <> '' AND
[VendorID] IS NOT NULL AND
[Name]<> '' AND
[Name] IS NOT NULL
GROUP BY [NewName]
HAVING COUNT(*) > 1
ORDER BY [NewName]
我也试过将它们放入[dump_names] table中并加入两者,但我总是从同一个ID中获取多条记录
SELECT
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
ORDER BY pv.[Name]
SELECT
'Name Match' AS [Reason],
pv.[VendorID],
pv.[Name]
FROM [dupe_names] n
LEFT JOIN [Processed_Vendors] pv
ON pv.[Name] = n.[Name]
AND ( SELECT strNew FROM [dbo].[fn_strNorm](2, pv.[Name]) ) = n.[NewName]
ORDER BY pv.[Name]
我想我想得太多了,或者我正在运动的偏头痛使我的思想蒙上了阴影。 无论哪种方式,我都感谢您的帮助。
一种方法是使用 CTE 来查找每个规范化名称的名称变体的不同计数。然后,加入您当前的 table 并仅保留具有多个姓名变体的记录。
WITH cte AS (
SELECT [dbo].[fn_strNorm](2, Name) AS NmName, COUNT(DISTINCT Name) AS cnt
FROM Processed_Vendors
GROUP BY [dbo].[fn_strNorm](2, Name)
)
SELECT DISTINCT pv.ID, pv.Name
FROM Processed_Vendors pv
INNER JOIN cte t
ON t.NmName = [dbo].[fn_strNorm](2, pv.Name)
WHERE t.cnt > 1;