比较相同 table 中的记录时如何避免 return 重复(A:B 和 B:A)
How to not return duplicates when comparing records in same table (A:B and B:A)
我已经被这个问题困扰了一段时间,无法解决,非常感谢您的指导
我正在比较一个人 table 的记录,看看它们是否可能相同。为此,我使用 with 语句来获取我需要的值并查找匹配项
CREATE TABLE persons (
serialno VARCHAR(20) NOT NULL,
given VARCHAR(30) NOT NULL,
family VARCHAR(30) NOT NULL,
dob DATE NOT NULL,
gender VARCHAR2(20 BYTE),
address VARCHAR2(64 BYTE)
);
INSERT ALL
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '001', 'Mick', 'Dundon','01/01/1970','Male','Main St' )
INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '002', 'Mick', 'Dundon','01/01/1970', 'Male','Montague St' )
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '003', 'Dave', 'Doyle', '13/10/1981','Male', 'Rathmines')
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '004', 'Jim', 'Morrison', '15/08/1956','Male','Newtown')
INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '005', 'Sam', 'Wise', '12/12/1992','Male','High St')
SELECT 1 FROM dual;
with rec as
(select serialno,given,family,dob,gender,address
from persons)
select *
from rec r1
join rec r2
on r1.given = r2.given
and r1.family = r2.family
and r1.gender = r2.gender
and r1.serialno <> r2.serialno
代码工作正常,除了我最终得到重复项,因为 R1 记录将在输出中进一步显示为 R2,反之亦然。
有没有一种简单的方法可以避免这种重复?
您可以使用解析 COUNT
函数获取所有没有 self-join 的重复项:
SELECT serialno, given, family, dob, gender, address
FROM (
SELECT serialno, given, family, dob, gender, address,
COUNT(*) OVER (PARTITION BY given, family, gender) AS num_matches
FROM persons
)
WHERE num_matches > 1;
如果您还想将值与具有相同 given
/family
/gender
组合和最小序列号的行进行比较,那么您可以再次避免 self-join 通过使用解析函数:
SELECT serialno, given, family, dob, gender, address,
min_serialno, min_dob, min_address
FROM (
SELECT serialno,
given,
family,
dob,
gender,
address,
MIN(serialno) OVER (PARTITION BY given, family, gender) AS min_serialno,
MIN(dob) KEEP (DENSE_RANK FIRST ORDER BY serialno)
OVER (PARTITION BY given, family, gender) AS min_dob,
MIN(address) KEEP (DENSE_RANK FIRST ORDER BY serialno)
OVER (PARTITION BY given, family, gender) AS min_address
FROM persons
)
WHERE serialno > min_serialno;
如果在 Oracle 中,您想获得所有可能的组合,那么您可以使用分层查询来避免 self-join:
SELECT serialno, given, family, dob, gender, address,
PRIOR serialno AS p_serialno,
PRIOR dob AS p_dob,
PRIOR address AS p_address
FROM persons
WHERE LEVEL = 2
CONNECT BY
PRIOR gender = gender
AND PRIOR given = given
AND PRIOR family = family
AND PRIOR serialno < serialno
db<>fiddle here
我已经被这个问题困扰了一段时间,无法解决,非常感谢您的指导
我正在比较一个人 table 的记录,看看它们是否可能相同。为此,我使用 with 语句来获取我需要的值并查找匹配项
CREATE TABLE persons (
serialno VARCHAR(20) NOT NULL,
given VARCHAR(30) NOT NULL,
family VARCHAR(30) NOT NULL,
dob DATE NOT NULL,
gender VARCHAR2(20 BYTE),
address VARCHAR2(64 BYTE)
);
INSERT ALL
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '001', 'Mick', 'Dundon','01/01/1970','Male','Main St' )
INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '002', 'Mick', 'Dundon','01/01/1970', 'Male','Montague St' )
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '003', 'Dave', 'Doyle', '13/10/1981','Male', 'Rathmines')
INTO persons ( serialno, given, family,dob,gender,address ) VALUES ( '004', 'Jim', 'Morrison', '15/08/1956','Male','Newtown')
INTO persons ( serialno, given, family, dob,gender,address) VALUES ( '005', 'Sam', 'Wise', '12/12/1992','Male','High St')
SELECT 1 FROM dual;
with rec as
(select serialno,given,family,dob,gender,address
from persons)
select *
from rec r1
join rec r2
on r1.given = r2.given
and r1.family = r2.family
and r1.gender = r2.gender
and r1.serialno <> r2.serialno
代码工作正常,除了我最终得到重复项,因为 R1 记录将在输出中进一步显示为 R2,反之亦然。
有没有一种简单的方法可以避免这种重复?
您可以使用解析 COUNT
函数获取所有没有 self-join 的重复项:
SELECT serialno, given, family, dob, gender, address
FROM (
SELECT serialno, given, family, dob, gender, address,
COUNT(*) OVER (PARTITION BY given, family, gender) AS num_matches
FROM persons
)
WHERE num_matches > 1;
如果您还想将值与具有相同 given
/family
/gender
组合和最小序列号的行进行比较,那么您可以再次避免 self-join 通过使用解析函数:
SELECT serialno, given, family, dob, gender, address,
min_serialno, min_dob, min_address
FROM (
SELECT serialno,
given,
family,
dob,
gender,
address,
MIN(serialno) OVER (PARTITION BY given, family, gender) AS min_serialno,
MIN(dob) KEEP (DENSE_RANK FIRST ORDER BY serialno)
OVER (PARTITION BY given, family, gender) AS min_dob,
MIN(address) KEEP (DENSE_RANK FIRST ORDER BY serialno)
OVER (PARTITION BY given, family, gender) AS min_address
FROM persons
)
WHERE serialno > min_serialno;
如果在 Oracle 中,您想获得所有可能的组合,那么您可以使用分层查询来避免 self-join:
SELECT serialno, given, family, dob, gender, address,
PRIOR serialno AS p_serialno,
PRIOR dob AS p_dob,
PRIOR address AS p_address
FROM persons
WHERE LEVEL = 2
CONNECT BY
PRIOR gender = gender
AND PRIOR given = given
AND PRIOR family = family
AND PRIOR serialno < serialno
db<>fiddle here