根据字段中的值删除重复行
Remove duplicate rows based on a value in a field
我有一个非常大的table(几百万条记录)。一些记录有重复项(基于 FieldA),其中唯一的区别是 FiedldB 中的值。我想创建一个查询,该查询将删除所有基于 FieldA 的重复记录,保留 FieldB 中具有最低值的记录。这可能吗?
提取这些值似乎非常简单:
select distinct a,
min(b) b
from t
group by a;
Fiddle 例如:http://sqlfiddle.com/#!9/bc4c9/3
您应该可以从中调整删除方法。
CREATE TABLE TABLE1
(
FieldA VARCHAR2(30),
FieldB VARCHAR2(30),
FieldC VARCHAR2(30)
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B1','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B4','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B3','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B2','DUMMYDATA-C1'
);
COMMIT;
SELECT FieldA,
FieldB,
FieldC,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1; --IDENTIFIES DUPLICATES BASED ON RANK VALUE
---PERFORM DELETE
DELETE
FROM TABLE1
WHERE ROWID IN
(SELECT ROWID
FROM
(SELECT ROWID,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1
)
WHERE COLUMN_ALIAS>1
);
COMMIT;
SELECT * FROM TABLE1; -- CONTAINS A SINGLE RECORD
RANK函数可以识别重复的记录,便于只删除重复的记录,保留原来的行。这已经在这里讨论过:Deleting duplicates rows from oracle。希望这有帮助
但是由于DELETE本身比较慢,可以在这种情况下(包含数百万条记录)在INSERT处实施适当的约束以避免重复输入。
我有一个非常大的table(几百万条记录)。一些记录有重复项(基于 FieldA),其中唯一的区别是 FiedldB 中的值。我想创建一个查询,该查询将删除所有基于 FieldA 的重复记录,保留 FieldB 中具有最低值的记录。这可能吗?
提取这些值似乎非常简单:
select distinct a,
min(b) b
from t
group by a;
Fiddle 例如:http://sqlfiddle.com/#!9/bc4c9/3
您应该可以从中调整删除方法。
CREATE TABLE TABLE1
(
FieldA VARCHAR2(30),
FieldB VARCHAR2(30),
FieldC VARCHAR2(30)
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B1','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B4','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B3','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B2','DUMMYDATA-C1'
);
COMMIT;
SELECT FieldA,
FieldB,
FieldC,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1; --IDENTIFIES DUPLICATES BASED ON RANK VALUE
---PERFORM DELETE
DELETE
FROM TABLE1
WHERE ROWID IN
(SELECT ROWID
FROM
(SELECT ROWID,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1
)
WHERE COLUMN_ALIAS>1
);
COMMIT;
SELECT * FROM TABLE1; -- CONTAINS A SINGLE RECORD
RANK函数可以识别重复的记录,便于只删除重复的记录,保留原来的行。这已经在这里讨论过:Deleting duplicates rows from oracle。希望这有帮助
但是由于DELETE本身比较慢,可以在这种情况下(包含数百万条记录)在INSERT处实施适当的约束以避免重复输入。