如何删除 SQL 服务器中的重复行?
How to remove duplicate rows in SQL Server?
环境:
- OS: Windows 服务器 2012 数据中心
- 数据库管理系统:SQL 服务器 2012
- 硬件 (VPS):Xeon E5530 4 核 + 4GB RAM
问题:
我有一个很大的 table,有 1.4 亿行。有些行应该是重复的,所以我想删除这些行。例如:
id name value timestamp
---------------------------------------
001 dummy1 10 2015-7-27 10:00:00
002 dummy1 10 2015-7-27 10:00:00 <-- duplicate
003 dummy1 20 2015-7-27 10:00:00
第二行被认为是重复的,因为它具有相同的 name
、value
和 timestamp
,尽管 不同 id
第一行。
注意:前两行重复NOT因为所有相同的列,但由于自定义规则。
我尝试使用 window 函数删除此类重复:
select
id, name, value, timestamp
from
(select
id, name, value, timestamp,
DATEDIFF(SECOND, lag(timestamp, 1) over (partition by name order by timestamp),
timestamp) [TimeDiff]
from table) tab
但是执行一个小时后,锁用完并引发错误:
Msg 1204, Level 19, State 4, Line 2
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
我怎样才能有效地删除这些重复的行?
使用 cte 怎么样?像这样。
with DeDupe as
(
select id
, [name]
, [value]
, [timestamp]
, ROW_NUMBER() over (partition by [name], [value], [timestamp] order by id) as RowNum
from SomeTable
)
Delete DeDupe
where RowNum > 1;
尝试这样的操作 - 确定每组值的最小 ID,然后删除 ID 不是最小值的行。
Select Name, Value, TimeStamp, min(ID) as LowestID
into #temp1
From MyTable
group by Name, Value, TimeStamp
Delete MyTable
from MyTable a
inner join #temp1 b
on a.Name = b.Name
and a.Value = b.Value
and a.Timestamp = b.timestamp
and a.ID <> b.LowestID
如果唯一的问题是从 table 中选择非重复行,请考虑使用此脚本
SELECT MIN(id), name, value, timestamp FROM table GROUP BY name, value, timestamp
如果需要删除重复行:
DELETE FROM table WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY name, value, timestamp)
或
DELETE t FROM table t INNER JOIN
table t2 ON
t.name=t2.name AND
t.value=t2.value AND
t.timestamp=t2.timestamp AND
t2.id<t.id
环境:
- OS: Windows 服务器 2012 数据中心
- 数据库管理系统:SQL 服务器 2012
- 硬件 (VPS):Xeon E5530 4 核 + 4GB RAM
问题:
我有一个很大的 table,有 1.4 亿行。有些行应该是重复的,所以我想删除这些行。例如:
id name value timestamp
---------------------------------------
001 dummy1 10 2015-7-27 10:00:00
002 dummy1 10 2015-7-27 10:00:00 <-- duplicate
003 dummy1 20 2015-7-27 10:00:00
第二行被认为是重复的,因为它具有相同的 name
、value
和 timestamp
,尽管 不同 id
第一行。
注意:前两行重复NOT因为所有相同的列,但由于自定义规则。
我尝试使用 window 函数删除此类重复:
select
id, name, value, timestamp
from
(select
id, name, value, timestamp,
DATEDIFF(SECOND, lag(timestamp, 1) over (partition by name order by timestamp),
timestamp) [TimeDiff]
from table) tab
但是执行一个小时后,锁用完并引发错误:
Msg 1204, Level 19, State 4, Line 2
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
我怎样才能有效地删除这些重复的行?
使用 cte 怎么样?像这样。
with DeDupe as
(
select id
, [name]
, [value]
, [timestamp]
, ROW_NUMBER() over (partition by [name], [value], [timestamp] order by id) as RowNum
from SomeTable
)
Delete DeDupe
where RowNum > 1;
尝试这样的操作 - 确定每组值的最小 ID,然后删除 ID 不是最小值的行。
Select Name, Value, TimeStamp, min(ID) as LowestID
into #temp1
From MyTable
group by Name, Value, TimeStamp
Delete MyTable
from MyTable a
inner join #temp1 b
on a.Name = b.Name
and a.Value = b.Value
and a.Timestamp = b.timestamp
and a.ID <> b.LowestID
如果唯一的问题是从 table 中选择非重复行,请考虑使用此脚本
SELECT MIN(id), name, value, timestamp FROM table GROUP BY name, value, timestamp
如果需要删除重复行:
DELETE FROM table WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY name, value, timestamp)
或
DELETE t FROM table t INNER JOIN
table t2 ON
t.name=t2.name AND
t.value=t2.value AND
t.timestamp=t2.timestamp AND
t2.id<t.id