使用或条件按多列将行分组在一起
Group rows together by multiple columns with or condition
我有 table 的人,我需要一种方法来通过多种可能的场景查找重复记录。例如,如果 fname、lname 和 address 相同,或者如果 fname、lname、dob 相同,或者如果 fname、lname 和 Email 相同,则将它们组合在一起。我无法在 SQL 中找到执行此操作的方法。我仅将上述示例用作示例,因为分组标准最终会更加严格。我已经用 SQL Fiddle 中的数据设置了一个示例。我想要的结果是将记录 2-5 组合在一起,而 1 和 6 将是唯一的行。
CREATE TABLE Persons (
ID int IDENTITY(1,1),
FirstName varchar(255),
LastName varchar(255),
Address1 varchar(255),
City varchar(255),
State varchar(255),
BDay Varchar(255),
Email Varchar(255)
);
INSERT INTO Persons
SELECT 'RICK', 'ALLEN', '44 Street', 'Minneapolis', 'MN', '1/2/1970','help@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1981','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '42 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test2@test.com'
UNION ALL
SELECT 'STEVEN', 'ALLEN', '555 Street', 'Minneapolis', 'MN', '2/8/1980','help@test.com'
您可以使用 not exists
子句:
select p1.*
from Persons p1
where not exists
(
select *
from Persons p2
where p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
)
Working example on SQL Fiddle.
回复您的评论,您可以使用更新查询在 table 中标记重复项:
with dupe as
(
select min(p1.ID) as OriginalID
, p2.ID as DupeID
from Persons p1
join Persons p2
on p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
group by
p2.ID
)
update p1
set DupeOfID = dupe.OriginalID
from Persons p1
join dupe
on dupe.DupeID = p1.ID
我有 table 的人,我需要一种方法来通过多种可能的场景查找重复记录。例如,如果 fname、lname 和 address 相同,或者如果 fname、lname、dob 相同,或者如果 fname、lname 和 Email 相同,则将它们组合在一起。我无法在 SQL 中找到执行此操作的方法。我仅将上述示例用作示例,因为分组标准最终会更加严格。我已经用 SQL Fiddle 中的数据设置了一个示例。我想要的结果是将记录 2-5 组合在一起,而 1 和 6 将是唯一的行。
CREATE TABLE Persons (
ID int IDENTITY(1,1),
FirstName varchar(255),
LastName varchar(255),
Address1 varchar(255),
City varchar(255),
State varchar(255),
BDay Varchar(255),
Email Varchar(255)
);
INSERT INTO Persons
SELECT 'RICK', 'ALLEN', '44 Street', 'Minneapolis', 'MN', '1/2/1970','help@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1981','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '42 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test2@test.com'
UNION ALL
SELECT 'STEVEN', 'ALLEN', '555 Street', 'Minneapolis', 'MN', '2/8/1980','help@test.com'
您可以使用 not exists
子句:
select p1.*
from Persons p1
where not exists
(
select *
from Persons p2
where p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
)
Working example on SQL Fiddle.
回复您的评论,您可以使用更新查询在 table 中标记重复项:
with dupe as
(
select min(p1.ID) as OriginalID
, p2.ID as DupeID
from Persons p1
join Persons p2
on p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
group by
p2.ID
)
update p1
set DupeOfID = dupe.OriginalID
from Persons p1
join dupe
on dupe.DupeID = p1.ID