如何删除 SQL 服务器中的重复行?

How to remove duplicate rows in SQL Server?

环境:

问题:

我有一个很大的 table,有 1.4 亿行。有些行应该是重复的,所以我想删除这些行。例如:

id   name   value   timestamp
---------------------------------------
001  dummy1 10      2015-7-27 10:00:00
002  dummy1 10      2015-7-27 10:00:00    <-- duplicate
003  dummy1 20      2015-7-27 10:00:00

第二行被认为是重复的,因为它具有相同的 namevaluetimestamp ,尽管 不同 id第一行。

注意:前两行重复NOT因为所有相同的列,但由于自定义规则。

我尝试使用 window 函数删除此类重复:

select 
    id, name, value, timestamp
from
   (select 
        id, name, value, timestamp,
        DATEDIFF(SECOND, lag(timestamp, 1) over (partition by name order by timestamp),
        timestamp) [TimeDiff]
    from table) tab

但是执行一个小时后,锁用完并引发错误:

Msg 1204, Level 19, State 4, Line 2
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.

我怎样才能有效地删除这些重复的行?

使用 cte 怎么样?像这样。

with DeDupe as
(
    select id
        , [name]
        , [value]
        , [timestamp]
        , ROW_NUMBER() over (partition by [name], [value], [timestamp] order by id) as RowNum
    from SomeTable
)

Delete DeDupe
where RowNum > 1;

尝试这样的操作 - 确定每组值的最小 ID,然后删除 ID 不是最小值的行。

Select Name, Value, TimeStamp, min(ID) as LowestID
into #temp1
From MyTable
group by Name, Value, TimeStamp

Delete MyTable 
from MyTable a
inner join #temp1 b
on a.Name = b.Name 
  and a.Value = b.Value 
  and a.Timestamp = b.timestamp 
  and a.ID <> b.LowestID

如果唯一的问题是从 table 中选择非重复行,请考虑使用此脚本

SELECT MIN(id), name, value, timestamp FROM table GROUP BY name, value, timestamp

如果需要删除重复行:

DELETE FROM table  WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY name, value, timestamp)

DELETE t FROM table t INNER JOIN 
table t2  ON
t.name=t2.name AND 
t.value=t2.value AND 
t.timestamp=t2.timestamp AND 
t2.id<t.id