从 SQL Table 中识别重复项并过滤唯一记录

Identify Duplicates and Filter Unique Records From SQL Table

我在 SQL 服务器数据库中有 3 个 table table 1:员工

GroupId |   EmployeeId  |   IsDuplicate
=======================================
    1   |       101     |       0
    1   |       102     |       1
    1   |       103     |       1
--------|---------------|--------------
    2   |       201     |       1
    2   |       202     |       0
--------|---------------|--------------
    3   |       301     |       1
    3   |       302     |       1
    3   |       303     |       0
---------------------------------------

table 2:联系人

ContactId   |   ContactEmail    |   Name    |...
================================================
    11      |       c1@mail.com |   c1 x    |
    12      |       c2.mail.com |   c2 y    |
    13      |       c1.mail.com |   c1.x    |
    14      |       c3@mail.com |   c3.z    |
    15      |       c3@mail.com |   c3 z    |
------------|-------------------|-----------|
    21      |       d1@mail.com |   d1 a    |
    22      |       d2@mail.com |   d2 b    |
------------|-------------------|-----------|
    31      |       e1@mail.com |   e1 m    |
    32      |       e1@mail.com |   e1m     |
    33      |       e2@mail.com |   e2 n    |
--------------------------------------------

table 3:员工联系人

EmployeeId  |   ContactId
=========================
    101     |       11
    101     |       12
    102     |       13
    102     |       14
    103     |       15
------------|------------
    201     |       21
    202     |       22
------------|------------
    301     |       31
    302     |       32
    303     |       33
-------------------------

table 1 具有预先确定的员工组并作为重复项分组在一起,并保留每个组的一名员工 (IsDuplicate=0),其余员工稍后将被删除。 每个员工都有一个或多个联系人,每个联系人详细信息保存在不同的 'Contacts' table 中,哪个员工有什么联系人保存在单独的 'EmployeeContacts' table 中。 此联系人 table 有多个列,只有电子邮件 ID 可用于识别与其他人重复的人。

问题陈述: 我需要编写一个脚本,我需要在其中获取每个组的唯一联系人并将其汇总到该组的活跃员工 (IsDuplicate=0)。 也就是说,我必须从 'Contacts' table 中识别每组员工的重复联系人,并将唯一的联系人插入到 'EmployeeContacts' table 中。 所以目标结果是: table 3:员工联系人

EmployeeId  |   ContactId
=========================
    101     |       11
    101     |       12
    101     |       14 -- <-- Unique contcat rolled up to the selected active employee 
    102     |       13
    102     |       14
    103     |       15
------------|------------
    201     |       21
    202     |       22
    202     |       21 -- <-- Unique contcat rolled up to the selected active employee 
------------|------------
    301     |       31
    302     |       32
    303     |       33 
    303     |       31 -- <-- Unique contcat rolled up to the selected active employee 
-------------------------

我试图解决这样的问题: 我会加入所有 3 tables 并将联系人电子邮件、组 ID、联系人 ID 和员工 ID 放入单独的临时 table,然后使用光标循环遍历此临时 table 并将唯一的联系人 ID 放在单独的临时文件中 table

SELECT GroupId, ec.EmployeeId, ec.ContactId, c.ContactEmail
INTO #Temp1
FROM EmployeeContacts ec
INNER JOIN Contacts c ON c.ContactId = ec.ContactId
INNER JOIN Employee e ON e.EmployeeId = ec.EmployeeId
ORDER BY GroupId, c.ContactEmail

所以对于 groupid=1,温度 table 看起来像这样

温度 1

GroupId |   EmployeeId  |   ContactId   |   ContactEmail
========================================================
    1   |       101     |       11      |   c1@mail.com
    1   |       102     |       13      |   c1.mail.com
    1   |       101     |       12      |   c2.mail.com
    1   |       102     |       14      |   c3@mail.com
    1   |       103     |       15      |   c3@mail.com
--------------------------------------------------------

现在我正在考虑使用游标循环遍历每个组的#Temp table 并通过比较每个联系人的 'ContactEmail' 值我将识别唯一的联系人并将它们输出到另一个温度 table

CREATE TABLE #Temp2
(
    ContactId INT,
    GroupId INT
)

DECLARE cur CURSOR FOR
SELECT GroupId, ContactId, ContactEmail
FROM #Temp1

DECLARE @ContactEmail NVARCHAR(50), @GroupId INT, @ContactId INT;
DECLARE @CompareEmail NVARCHAR(50);

OPEN cur
FETCH NEXT FROM cur
INTO @GroupId, @ContactId, @ContactEmail

SET @CompareEmail = '';

WHILE @@FETCH_STATUS=0
BEGIN
    IF(@CompareEmail <>  @ContactEmail)
    BEGIN
        SET @CompareEmail = @ContactEmail
        INSERT INTO #Temp2 VALUES(@ContactId, @GroupId)
    END

    FETCH NEXT FROM cur
    INTO @GroupId, @ContactId, @ContactEmail
END

CLOSE cur
DEALLOCATE cur

我还没有 运行,但我想这会给我 #Temp2 table 和唯一的联系 ID

温度 2

GroupId |   ContactId
=====================
    1   |       11
    1   |       12
    1   |       14
---------------------

现在我知道每个组中哪个员工处于活动状态,我可以在 'EmployeeContacts' table 中插入新的联系人 ID。

但这整个过程似乎有点复杂而且很费时间。那么有没有更短或更简单的方法呢?

您可以尝试以下方法以获得想要的结果:

declare cur cursor fast_forward for select distinct GroupId from @employee
declare @groupid int, @employeeid int

open cur
fetch next from cur into @groupid

while (@@FETCH_STATUS = 0)
begin
    select @employeeid = EmployeeId from @employee where GroupId = @groupid and IsDuplicate = 0
    select e.EmployeeId, IsDuplicate, c.*
    into #temp
    from @employee e
    join @employeecontacts ec on ec.EmployeeId = e.EmployeeId
    join @contacts c on c.ContactId = ec.ContactId
    where GroupId = @groupid

    ;with cte as
    (
        select @employeeid employeeid, t.ContactId, t.ContactEmail, ROW_NUMBER() over (partition by t.ContactEmail order by  t.ContactId) rn--, c.ContactEmail as ce2
        from #temp t 
        left join (select EmployeeId, x.ContactId, ContactEmail from @employeecontacts x join @contacts c on c.ContactId = x.ContactId)ec 
        on ec.EmployeeId = @employeeid and ec.ContactEmail = t.ContactEmail
        where ec.ContactId is null
    )
    insert into @employeecontacts
    select employeeid, ContactId from cte where rn = 1

    drop table if exists #temp
    fetch next from cur into @groupid
end

close cur
deallocate cur

请找到数据库<>fiddle here.