识别 table 中的重复项:寻找查询建议

Identifying duplicates within a table: looking for query advice

所以我正在尝试识别帐户中重复的联系人记录,并寻找执行此操作的最佳方法。有一个帐户 table 和一个联系人 table。下面是我想出的查询,可以提供我需要的东西,但我觉得可能有 better/more 有效的方法来做到这一点,所以寻找任何 feedback/advice。提前致谢!

SELECT * FROM sysdba.CONTACT a WITH(NOLOCK)
WHERE EXISTS
(
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL FROM sysdba.CONTACT b WITH(NOLOCK)
GROUP BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
HAVING COUNT(*) > 1
AND a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL
)
ORDER BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL

这是我可以执行此操作的另一种方法,但不得不使用 DISTINCT 似乎很丑陋..

SELECT DISTINCT a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL FROM sysdba.CONTACT a WITH(NOLOCK)
JOIN sysdba.CONTACT b WITH(NOLOCK)
ON a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL AND a.CONTACTID != b.CONTACTID
ORDER BY a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL

在检查两者的执行计划时,第一个查询是 37%,而第二个查询是 63%,这令人惊讶,因为我一直(显然是错误的)使用联接比依赖更快一个 where 子句。

当您尝试识别重复项时,很常见的做法是使用窗口聚合函数,例如 COUNT() OVER (...)ROW_NUMBER() OVER (...)

下面是应该 return 记录组的查询,其中有多个 CONTACTID 相同的 ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL 组合。换句话说,这个查询 returns 记录,具有重复项,以及它们的重复项:

;WITH cteCONTACT
AS (
    SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID,
        CNT = COUNT(*) OVER (PARTITION BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL)
    FROM sysdba.CONTACT
)
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID
FROM cteCONTACT
WHERE CNT > 1;

下面的查询应该 return 只重复,没有重复的记录是:

;WITH cteCONTACT
AS (
    SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID,
        NUM = ROW_NUMBER() OVER (
            PARTITION BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
            ORDER BY CONTACTID)
    FROM sysdba.CONTACT
)
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID
FROM cteCONTACT
WHERE NUM > 1;