如何更新较旧的 "duplicate" 记录(除了日期列之外的重复记录)
How to update older "duplicate" records (duplicate except for the date column)
我们有一个 table,在这个例子中,它包含指向每个订阅者的人口统计问题 (questionID) 的链接,日期指示订阅者回答特定人口统计问题的时间。在某些情况下,订阅者可能在稍后再次回答了相同的问题,我们现在有相同订阅者和问题 ID 的多条记录,但回答日期不同(参见示例数据):
subscriberID questionID dateAnswered isDeleted
------------ ----------- ----------------------- ---------
100 559 2015-07-29 13:07:26.153 0
100 560 2015-07-29 13:07:26.153 0
100 561 2015-07-29 13:07:26.153 0
100 562 2015-07-29 13:07:26.153 0
100 575 2015-07-29 13:07:26.153 0
102 559 2015-07-30 15:12:46.143 0
102 564 2015-07-30 15:12:46.143 0
102 588 2015-07-30 15:12:46.143 0
102 559 2015-07-31 16:11:53.323 0
114 575 2015-08-21 11:27:14.253 0
114 588 2015-08-21 11:27:14.253 0
114 560 2015-08-21 11:27:14.253 0
114 588 2015-08-24 05:44:42.030 0
114 562 2015-08-21 11:27:14.253 0
114 575 2015-08-24 05:44:42.030 0
存储答案的应用程序应该将旧记录标记为 "deleted"(设置 isDeleted = 1),但它没有这样做,我现在需要清理旧记录。
这看起来应该很简单,但它让我感到难过。我如何 (a) select 存在重复的 subscriberID 和 questionID 但回答日期不同的任何记录?并且 (b) 我如何进行更新以将每个订阅者除最新记录之外的所有记录都设置为 isDeleted=1?
如有任何帮助,我们将不胜感激!我怀疑自我加入可能是有序的,但我还没有弄清楚。所以问题!
;WITH X AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY
subscriberID, questionID
ORDER BY dateAnswered DESC) rn
, *
FROM TableName
)
UPDATE X
SET isDeleted = 1
WHERE rn > 1
下面的 select /update 将影响所有未标记为已删除的记录,每个订阅者针对每个问题的最后一条记录除外。只是另一种方法。
WITH LastAnswers AS
(
SELECT subscriberID ,questionID , MAX(dateAnswered) AS LastAnsweredDate
FROM TableName
GROUP BY subscriberID ,questionID
)
UPDATE TableName
SET TableName.isDeleted = 1
FROM
TableName
LEFT JOIN LastAnswers
ON TableName.subscriberID = LastAnswers.subscriberID
AND TableName.questionID = LastAnswers.questionID
AND TableName.dateAnswered = LastAnswers.LastAnsweredDate
WHERE LastAnswers.LastAnsweredDate IS NULL AND TableName.isDeleted = 0
我们有一个 table,在这个例子中,它包含指向每个订阅者的人口统计问题 (questionID) 的链接,日期指示订阅者回答特定人口统计问题的时间。在某些情况下,订阅者可能在稍后再次回答了相同的问题,我们现在有相同订阅者和问题 ID 的多条记录,但回答日期不同(参见示例数据):
subscriberID questionID dateAnswered isDeleted
------------ ----------- ----------------------- ---------
100 559 2015-07-29 13:07:26.153 0
100 560 2015-07-29 13:07:26.153 0
100 561 2015-07-29 13:07:26.153 0
100 562 2015-07-29 13:07:26.153 0
100 575 2015-07-29 13:07:26.153 0
102 559 2015-07-30 15:12:46.143 0
102 564 2015-07-30 15:12:46.143 0
102 588 2015-07-30 15:12:46.143 0
102 559 2015-07-31 16:11:53.323 0
114 575 2015-08-21 11:27:14.253 0
114 588 2015-08-21 11:27:14.253 0
114 560 2015-08-21 11:27:14.253 0
114 588 2015-08-24 05:44:42.030 0
114 562 2015-08-21 11:27:14.253 0
114 575 2015-08-24 05:44:42.030 0
存储答案的应用程序应该将旧记录标记为 "deleted"(设置 isDeleted = 1),但它没有这样做,我现在需要清理旧记录。
这看起来应该很简单,但它让我感到难过。我如何 (a) select 存在重复的 subscriberID 和 questionID 但回答日期不同的任何记录?并且 (b) 我如何进行更新以将每个订阅者除最新记录之外的所有记录都设置为 isDeleted=1?
如有任何帮助,我们将不胜感激!我怀疑自我加入可能是有序的,但我还没有弄清楚。所以问题!
;WITH X AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY
subscriberID, questionID
ORDER BY dateAnswered DESC) rn
, *
FROM TableName
)
UPDATE X
SET isDeleted = 1
WHERE rn > 1
下面的 select /update 将影响所有未标记为已删除的记录,每个订阅者针对每个问题的最后一条记录除外。只是另一种方法。
WITH LastAnswers AS
(
SELECT subscriberID ,questionID , MAX(dateAnswered) AS LastAnsweredDate
FROM TableName
GROUP BY subscriberID ,questionID
)
UPDATE TableName
SET TableName.isDeleted = 1
FROM
TableName
LEFT JOIN LastAnswers
ON TableName.subscriberID = LastAnswers.subscriberID
AND TableName.questionID = LastAnswers.questionID
AND TableName.dateAnswered = LastAnswers.LastAnsweredDate
WHERE LastAnswers.LastAnsweredDate IS NULL AND TableName.isDeleted = 0