尝试删除 SQL 服务器中的重复行,差异在于日期或批号
Trying to delete duplicate rows in SQL Server where the difference is the date or batch number
我有这个问题:
SELECT
T1.ID_NUMBER,
T1.INCEPTION_DATE,
T1.OCCURRENCE,
T1.TRANSACTION_DATE,
T1.FILE_LOAD_DATE,
T1.BATCH_NUM
FROM
mastertable T1
INNER JOIN
(SELECT
ID_NUMBER, INCEPTION_DATE, OCCURRENCE,
COUNT(*) AS DUPL_COUNT
FROM
mastertable
WHERE
SOURCE_SYSTEM ='LEGACY'
GROUP BY
ID_NUMBER, INCEPTION_DATE, OCCURRENCE
HAVING
COUNT(*) > 1) t2 ON T2.ID_NUMBER = T1.ID_NUMBER
AND T2.INCEPTION_DATE = T1.INCEPTION_DATE
AND T2.OCCURRENCE= T1.OCCURRENCE
ORDER BY
1, 2, 3, 4, 5
返回以下结果
ID_NUMBER
INCEPTION_DATE
OCCURRENCE
TRANSACTION_DATE
FILE_LOAD_DATE
BATCH_NUM
112897732
2008-09-15
4
2008-07-03
2008-07-07 17:57:19
06341
112897732
2008-09-15
4
2008-07-13
2008-07-18 03:35:55
06753
828194721
2008-11-11
1
2008-09-06
2008-09-17 02:50:44
97334
828194721
2008-11-11
1
2008-09-23
2008-09-24 02:55:27
98331
456457422
2008-09-28
1
2008-12-03
2008-07-13 08:08:39
00734
456457422
2008-09-28
1
2008-12-03
2008-07-18 13:35:55
00991
999272910
2008-05-07
3
2008-05-03
2008-10-13 08:08:38
11432
999272910
2008-05-07
3
2008-05-28
2008-10-18 03:35:55
13342
875328642
2008-03-01
3
2008-04-28
2008-01-23 08:08:38
74542
875328642
2008-03-01
3
2008-04-30
2008-01-25 12:55:11
77536
011028734
2008-07-12
2
2008-12-03
2008-08-07 11:57:03
23422
011028734
2008-07-12
2
2008-12-03
2008-08-11 17:23:29
25748
018264981
2008-07-09
0
2008-12-03
2008-12-07 02:18:12
00432
018264981
2008-07-09
0
2008-12-03
2008-12-11 17:44:19
00773
每个 ID_NUMBER
或更早的 FILE_LOAD_DATE
或更小的 BATCH_NUM
是我要保留的记录。
有没有一种方法可以编写删除其他记录的查询,也许使用带有 ROW_NUMBER()
的 CTE?
我希望有一些 DRY 的东西,以防这个问题再次发生。谢谢!
(另外,如果不是太麻烦,请解释解决方案的工作原理)
您可以在此处使用可删除的 CTE:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE
ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
FROM mastertable
WHERE SOURCE_SYSTEM = 'LEGACY'
)
DELETE
FROM cte
WHERE rn > 1;
逻辑是为ID_NUMBER
、INCEPTION_DATE
和OCCURRENCE
具有相同值的每组记录分配一个行号。第一行编号值 1 将分配给具有 earliest FILE_LOAD_DATE
的记录。如果两个或多个记录最早 FILE_LOAD_DATE
并列,则决胜局将由最早 BATCH_NUM
.
决定
删除语句删除所有记录除了这个最早的记录。
我有这个问题:
SELECT
T1.ID_NUMBER,
T1.INCEPTION_DATE,
T1.OCCURRENCE,
T1.TRANSACTION_DATE,
T1.FILE_LOAD_DATE,
T1.BATCH_NUM
FROM
mastertable T1
INNER JOIN
(SELECT
ID_NUMBER, INCEPTION_DATE, OCCURRENCE,
COUNT(*) AS DUPL_COUNT
FROM
mastertable
WHERE
SOURCE_SYSTEM ='LEGACY'
GROUP BY
ID_NUMBER, INCEPTION_DATE, OCCURRENCE
HAVING
COUNT(*) > 1) t2 ON T2.ID_NUMBER = T1.ID_NUMBER
AND T2.INCEPTION_DATE = T1.INCEPTION_DATE
AND T2.OCCURRENCE= T1.OCCURRENCE
ORDER BY
1, 2, 3, 4, 5
返回以下结果
ID_NUMBER | INCEPTION_DATE | OCCURRENCE | TRANSACTION_DATE | FILE_LOAD_DATE | BATCH_NUM |
---|---|---|---|---|---|
112897732 | 2008-09-15 | 4 | 2008-07-03 | 2008-07-07 17:57:19 | 06341 |
112897732 | 2008-09-15 | 4 | 2008-07-13 | 2008-07-18 03:35:55 | 06753 |
828194721 | 2008-11-11 | 1 | 2008-09-06 | 2008-09-17 02:50:44 | 97334 |
828194721 | 2008-11-11 | 1 | 2008-09-23 | 2008-09-24 02:55:27 | 98331 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-13 08:08:39 | 00734 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-18 13:35:55 | 00991 |
999272910 | 2008-05-07 | 3 | 2008-05-03 | 2008-10-13 08:08:38 | 11432 |
999272910 | 2008-05-07 | 3 | 2008-05-28 | 2008-10-18 03:35:55 | 13342 |
875328642 | 2008-03-01 | 3 | 2008-04-28 | 2008-01-23 08:08:38 | 74542 |
875328642 | 2008-03-01 | 3 | 2008-04-30 | 2008-01-25 12:55:11 | 77536 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-07 11:57:03 | 23422 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-11 17:23:29 | 25748 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-07 02:18:12 | 00432 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-11 17:44:19 | 00773 |
每个 ID_NUMBER
或更早的 FILE_LOAD_DATE
或更小的 BATCH_NUM
是我要保留的记录。
有没有一种方法可以编写删除其他记录的查询,也许使用带有 ROW_NUMBER()
的 CTE?
我希望有一些 DRY 的东西,以防这个问题再次发生。谢谢!
(另外,如果不是太麻烦,请解释解决方案的工作原理)
您可以在此处使用可删除的 CTE:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE
ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
FROM mastertable
WHERE SOURCE_SYSTEM = 'LEGACY'
)
DELETE
FROM cte
WHERE rn > 1;
逻辑是为ID_NUMBER
、INCEPTION_DATE
和OCCURRENCE
具有相同值的每组记录分配一个行号。第一行编号值 1 将分配给具有 earliest FILE_LOAD_DATE
的记录。如果两个或多个记录最早 FILE_LOAD_DATE
并列,则决胜局将由最早 BATCH_NUM
.
删除语句删除所有记录除了这个最早的记录。