从 SQL Table 中识别重复项并过滤唯一记录
Identify Duplicates and Filter Unique Records From SQL Table
我在 SQL 服务器数据库中有 3 个 table
table 1:员工
GroupId | EmployeeId | IsDuplicate
=======================================
1 | 101 | 0
1 | 102 | 1
1 | 103 | 1
--------|---------------|--------------
2 | 201 | 1
2 | 202 | 0
--------|---------------|--------------
3 | 301 | 1
3 | 302 | 1
3 | 303 | 0
---------------------------------------
table 2:联系人
ContactId | ContactEmail | Name |...
================================================
11 | c1@mail.com | c1 x |
12 | c2.mail.com | c2 y |
13 | c1.mail.com | c1.x |
14 | c3@mail.com | c3.z |
15 | c3@mail.com | c3 z |
------------|-------------------|-----------|
21 | d1@mail.com | d1 a |
22 | d2@mail.com | d2 b |
------------|-------------------|-----------|
31 | e1@mail.com | e1 m |
32 | e1@mail.com | e1m |
33 | e2@mail.com | e2 n |
--------------------------------------------
table 3:员工联系人
EmployeeId | ContactId
=========================
101 | 11
101 | 12
102 | 13
102 | 14
103 | 15
------------|------------
201 | 21
202 | 22
------------|------------
301 | 31
302 | 32
303 | 33
-------------------------
table 1 具有预先确定的员工组并作为重复项分组在一起,并保留每个组的一名员工 (IsDuplicate=0
),其余员工稍后将被删除。
每个员工都有一个或多个联系人,每个联系人详细信息保存在不同的 'Contacts' table 中,哪个员工有什么联系人保存在单独的 'EmployeeContacts' table 中。
此联系人 table 有多个列,只有电子邮件 ID 可用于识别与其他人重复的人。
问题陈述:
我需要编写一个脚本,我需要在其中获取每个组的唯一联系人并将其汇总到该组的活跃员工 (IsDuplicate=0
)。
也就是说,我必须从 'Contacts' table 中识别每组员工的重复联系人,并将唯一的联系人插入到 'EmployeeContacts' table 中。
所以目标结果是:
table 3:员工联系人
EmployeeId | ContactId
=========================
101 | 11
101 | 12
101 | 14 -- <-- Unique contcat rolled up to the selected active employee
102 | 13
102 | 14
103 | 15
------------|------------
201 | 21
202 | 22
202 | 21 -- <-- Unique contcat rolled up to the selected active employee
------------|------------
301 | 31
302 | 32
303 | 33
303 | 31 -- <-- Unique contcat rolled up to the selected active employee
-------------------------
我试图解决这样的问题:
我会加入所有 3 tables 并将联系人电子邮件、组 ID、联系人 ID 和员工 ID 放入单独的临时 table,然后使用光标循环遍历此临时 table 并将唯一的联系人 ID 放在单独的临时文件中 table
SELECT GroupId, ec.EmployeeId, ec.ContactId, c.ContactEmail
INTO #Temp1
FROM EmployeeContacts ec
INNER JOIN Contacts c ON c.ContactId = ec.ContactId
INNER JOIN Employee e ON e.EmployeeId = ec.EmployeeId
ORDER BY GroupId, c.ContactEmail
所以对于 groupid=1
,温度 table 看起来像这样
温度 1
GroupId | EmployeeId | ContactId | ContactEmail
========================================================
1 | 101 | 11 | c1@mail.com
1 | 102 | 13 | c1.mail.com
1 | 101 | 12 | c2.mail.com
1 | 102 | 14 | c3@mail.com
1 | 103 | 15 | c3@mail.com
--------------------------------------------------------
现在我正在考虑使用游标循环遍历每个组的#Temp table 并通过比较每个联系人的 'ContactEmail' 值我将识别唯一的联系人并将它们输出到另一个温度 table
CREATE TABLE #Temp2
(
ContactId INT,
GroupId INT
)
DECLARE cur CURSOR FOR
SELECT GroupId, ContactId, ContactEmail
FROM #Temp1
DECLARE @ContactEmail NVARCHAR(50), @GroupId INT, @ContactId INT;
DECLARE @CompareEmail NVARCHAR(50);
OPEN cur
FETCH NEXT FROM cur
INTO @GroupId, @ContactId, @ContactEmail
SET @CompareEmail = '';
WHILE @@FETCH_STATUS=0
BEGIN
IF(@CompareEmail <> @ContactEmail)
BEGIN
SET @CompareEmail = @ContactEmail
INSERT INTO #Temp2 VALUES(@ContactId, @GroupId)
END
FETCH NEXT FROM cur
INTO @GroupId, @ContactId, @ContactEmail
END
CLOSE cur
DEALLOCATE cur
我还没有 运行,但我想这会给我 #Temp2 table 和唯一的联系 ID
温度 2
GroupId | ContactId
=====================
1 | 11
1 | 12
1 | 14
---------------------
现在我知道每个组中哪个员工处于活动状态,我可以在 'EmployeeContacts' table 中插入新的联系人 ID。
但这整个过程似乎有点复杂而且很费时间。那么有没有更短或更简单的方法呢?
您可以尝试以下方法以获得想要的结果:
declare cur cursor fast_forward for select distinct GroupId from @employee
declare @groupid int, @employeeid int
open cur
fetch next from cur into @groupid
while (@@FETCH_STATUS = 0)
begin
select @employeeid = EmployeeId from @employee where GroupId = @groupid and IsDuplicate = 0
select e.EmployeeId, IsDuplicate, c.*
into #temp
from @employee e
join @employeecontacts ec on ec.EmployeeId = e.EmployeeId
join @contacts c on c.ContactId = ec.ContactId
where GroupId = @groupid
;with cte as
(
select @employeeid employeeid, t.ContactId, t.ContactEmail, ROW_NUMBER() over (partition by t.ContactEmail order by t.ContactId) rn--, c.ContactEmail as ce2
from #temp t
left join (select EmployeeId, x.ContactId, ContactEmail from @employeecontacts x join @contacts c on c.ContactId = x.ContactId)ec
on ec.EmployeeId = @employeeid and ec.ContactEmail = t.ContactEmail
where ec.ContactId is null
)
insert into @employeecontacts
select employeeid, ContactId from cte where rn = 1
drop table if exists #temp
fetch next from cur into @groupid
end
close cur
deallocate cur
请找到数据库<>fiddle here.
我在 SQL 服务器数据库中有 3 个 table table 1:员工
GroupId | EmployeeId | IsDuplicate
=======================================
1 | 101 | 0
1 | 102 | 1
1 | 103 | 1
--------|---------------|--------------
2 | 201 | 1
2 | 202 | 0
--------|---------------|--------------
3 | 301 | 1
3 | 302 | 1
3 | 303 | 0
---------------------------------------
table 2:联系人
ContactId | ContactEmail | Name |...
================================================
11 | c1@mail.com | c1 x |
12 | c2.mail.com | c2 y |
13 | c1.mail.com | c1.x |
14 | c3@mail.com | c3.z |
15 | c3@mail.com | c3 z |
------------|-------------------|-----------|
21 | d1@mail.com | d1 a |
22 | d2@mail.com | d2 b |
------------|-------------------|-----------|
31 | e1@mail.com | e1 m |
32 | e1@mail.com | e1m |
33 | e2@mail.com | e2 n |
--------------------------------------------
table 3:员工联系人
EmployeeId | ContactId
=========================
101 | 11
101 | 12
102 | 13
102 | 14
103 | 15
------------|------------
201 | 21
202 | 22
------------|------------
301 | 31
302 | 32
303 | 33
-------------------------
table 1 具有预先确定的员工组并作为重复项分组在一起,并保留每个组的一名员工 (IsDuplicate=0
),其余员工稍后将被删除。
每个员工都有一个或多个联系人,每个联系人详细信息保存在不同的 'Contacts' table 中,哪个员工有什么联系人保存在单独的 'EmployeeContacts' table 中。
此联系人 table 有多个列,只有电子邮件 ID 可用于识别与其他人重复的人。
问题陈述:
我需要编写一个脚本,我需要在其中获取每个组的唯一联系人并将其汇总到该组的活跃员工 (IsDuplicate=0
)。
也就是说,我必须从 'Contacts' table 中识别每组员工的重复联系人,并将唯一的联系人插入到 'EmployeeContacts' table 中。
所以目标结果是:
table 3:员工联系人
EmployeeId | ContactId
=========================
101 | 11
101 | 12
101 | 14 -- <-- Unique contcat rolled up to the selected active employee
102 | 13
102 | 14
103 | 15
------------|------------
201 | 21
202 | 22
202 | 21 -- <-- Unique contcat rolled up to the selected active employee
------------|------------
301 | 31
302 | 32
303 | 33
303 | 31 -- <-- Unique contcat rolled up to the selected active employee
-------------------------
我试图解决这样的问题: 我会加入所有 3 tables 并将联系人电子邮件、组 ID、联系人 ID 和员工 ID 放入单独的临时 table,然后使用光标循环遍历此临时 table 并将唯一的联系人 ID 放在单独的临时文件中 table
SELECT GroupId, ec.EmployeeId, ec.ContactId, c.ContactEmail
INTO #Temp1
FROM EmployeeContacts ec
INNER JOIN Contacts c ON c.ContactId = ec.ContactId
INNER JOIN Employee e ON e.EmployeeId = ec.EmployeeId
ORDER BY GroupId, c.ContactEmail
所以对于 groupid=1
,温度 table 看起来像这样
温度 1
GroupId | EmployeeId | ContactId | ContactEmail
========================================================
1 | 101 | 11 | c1@mail.com
1 | 102 | 13 | c1.mail.com
1 | 101 | 12 | c2.mail.com
1 | 102 | 14 | c3@mail.com
1 | 103 | 15 | c3@mail.com
--------------------------------------------------------
现在我正在考虑使用游标循环遍历每个组的#Temp table 并通过比较每个联系人的 'ContactEmail' 值我将识别唯一的联系人并将它们输出到另一个温度 table
CREATE TABLE #Temp2
(
ContactId INT,
GroupId INT
)
DECLARE cur CURSOR FOR
SELECT GroupId, ContactId, ContactEmail
FROM #Temp1
DECLARE @ContactEmail NVARCHAR(50), @GroupId INT, @ContactId INT;
DECLARE @CompareEmail NVARCHAR(50);
OPEN cur
FETCH NEXT FROM cur
INTO @GroupId, @ContactId, @ContactEmail
SET @CompareEmail = '';
WHILE @@FETCH_STATUS=0
BEGIN
IF(@CompareEmail <> @ContactEmail)
BEGIN
SET @CompareEmail = @ContactEmail
INSERT INTO #Temp2 VALUES(@ContactId, @GroupId)
END
FETCH NEXT FROM cur
INTO @GroupId, @ContactId, @ContactEmail
END
CLOSE cur
DEALLOCATE cur
我还没有 运行,但我想这会给我 #Temp2 table 和唯一的联系 ID
温度 2
GroupId | ContactId
=====================
1 | 11
1 | 12
1 | 14
---------------------
现在我知道每个组中哪个员工处于活动状态,我可以在 'EmployeeContacts' table 中插入新的联系人 ID。
但这整个过程似乎有点复杂而且很费时间。那么有没有更短或更简单的方法呢?
您可以尝试以下方法以获得想要的结果:
declare cur cursor fast_forward for select distinct GroupId from @employee
declare @groupid int, @employeeid int
open cur
fetch next from cur into @groupid
while (@@FETCH_STATUS = 0)
begin
select @employeeid = EmployeeId from @employee where GroupId = @groupid and IsDuplicate = 0
select e.EmployeeId, IsDuplicate, c.*
into #temp
from @employee e
join @employeecontacts ec on ec.EmployeeId = e.EmployeeId
join @contacts c on c.ContactId = ec.ContactId
where GroupId = @groupid
;with cte as
(
select @employeeid employeeid, t.ContactId, t.ContactEmail, ROW_NUMBER() over (partition by t.ContactEmail order by t.ContactId) rn--, c.ContactEmail as ce2
from #temp t
left join (select EmployeeId, x.ContactId, ContactEmail from @employeecontacts x join @contacts c on c.ContactId = x.ContactId)ec
on ec.EmployeeId = @employeeid and ec.ContactEmail = t.ContactEmail
where ec.ContactId is null
)
insert into @employeecontacts
select employeeid, ContactId from cte where rn = 1
drop table if exists #temp
fetch next from cur into @groupid
end
close cur
deallocate cur
请找到数据库<>fiddle here.