SQL (POSTGRESQL) 仅根据某些列删除重复值,根据每个重复集保留较新的值
SQL (POSTGRESQL) drop duplicate values based on certain columns only, keep newer value based on each duplicate set
我有以下 SQL table 称为 readings
。
date | today | yesterday | tomorrow | creationtime | source
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x1
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x2
2021-01-01 150 0.9 1 2021-01-01 12:55:17.... x3
....
2021-02-15 110 0.3 1 2021-02-15 12:42:17.... x1
2021-02-15 110 0.1 1 2021-02-15 12:42:17.... x2
2021-02-15 150 0.9 1 2021-02-15 12:55:17.... x3
...
2021-02-15 110 0.5 0 2021-02-16 16:06:04.008673 x17
2021-02-15 110 0.5 0 2021-02-15 15:59:46.383677 x17
....
2021-02-15 700 0.7 1 2021-02-16 16:04:02.267478 x20
2021-02-15 110 0.7 1 2021-02-15 15:59:48.060236 x20
....
2021-02-22 110 0.5 1 2021-02-15 16:01:16.826577 x55
2021-02-22 110 0.5 1 2021-02-16 16:09:17.524436 x55
每天有65篇阅读。
从 x1、x2、x3 到 x65 的读数。
所以我在某些日子里发现了重复的读数。
有时读数不一样,所以我想保留当天较新的读数,即使它是第二天才记录的。
我想删除重复的值,我想保留较新的创建时间。所以我希望我的 table 最终看起来像这样。
date | today | yesterday | tomorrow | creationtime | source
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x1
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x2
2021-01-01 150 0.9 1 2021-01-01 12:55:17.... x3
....
2021-02-15 110 0.3 1 2021-02-15 12:42:17.... x1
2021-02-15 110 0.1 1 2021-02-15 12:42:17.... x2
2021-02-15 150 0.9 1 2021-02-15 12:55:17.... x3
...
2021-02-15 110 0.5 0 2021-02-16 16:06:04.008673 x17
....
2021-02-15 700 0.7 1 2021-02-16 16:04:02.267478 x20
....
2021-02-22 110 0.5 1 2021-02-16 16:09:17.524436 x55
我试过了
create table new_readings as select distinct c.* from readings c;
但它只是创建了 table 的副本并删除了完全不同的值。
您可以使用 distinct on
:
select distinct on (date, today, yesterday, tomorrow ) r.*
from readings r
order by date, today, yesterday, tomorrow, creationtime desc;
下面的代码按“创建时间”删除所有重复的“源”行
delete from readings r1
where exists(
select * from readings r2
where r1.creationtime > r2.creationtime
and r1.source = r2.source
)
order by r1.creationtime;
好像很简单
select distinct on ("date", source) *
from readings
order by "date", source, creationtime desc;
上面写着“每天每个来源只选择一个(最新的)阅读”。
我有以下 SQL table 称为 readings
。
date | today | yesterday | tomorrow | creationtime | source
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x1
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x2
2021-01-01 150 0.9 1 2021-01-01 12:55:17.... x3
....
2021-02-15 110 0.3 1 2021-02-15 12:42:17.... x1
2021-02-15 110 0.1 1 2021-02-15 12:42:17.... x2
2021-02-15 150 0.9 1 2021-02-15 12:55:17.... x3
...
2021-02-15 110 0.5 0 2021-02-16 16:06:04.008673 x17
2021-02-15 110 0.5 0 2021-02-15 15:59:46.383677 x17
....
2021-02-15 700 0.7 1 2021-02-16 16:04:02.267478 x20
2021-02-15 110 0.7 1 2021-02-15 15:59:48.060236 x20
....
2021-02-22 110 0.5 1 2021-02-15 16:01:16.826577 x55
2021-02-22 110 0.5 1 2021-02-16 16:09:17.524436 x55
每天有65篇阅读。 从 x1、x2、x3 到 x65 的读数。
所以我在某些日子里发现了重复的读数。
有时读数不一样,所以我想保留当天较新的读数,即使它是第二天才记录的。
我想删除重复的值,我想保留较新的创建时间。所以我希望我的 table 最终看起来像这样。
date | today | yesterday | tomorrow | creationtime | source
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x1
2021-01-01 110 0.5 0 2021-01-01 12:42:17.... x2
2021-01-01 150 0.9 1 2021-01-01 12:55:17.... x3
....
2021-02-15 110 0.3 1 2021-02-15 12:42:17.... x1
2021-02-15 110 0.1 1 2021-02-15 12:42:17.... x2
2021-02-15 150 0.9 1 2021-02-15 12:55:17.... x3
...
2021-02-15 110 0.5 0 2021-02-16 16:06:04.008673 x17
....
2021-02-15 700 0.7 1 2021-02-16 16:04:02.267478 x20
....
2021-02-22 110 0.5 1 2021-02-16 16:09:17.524436 x55
我试过了
create table new_readings as select distinct c.* from readings c;
但它只是创建了 table 的副本并删除了完全不同的值。
您可以使用 distinct on
:
select distinct on (date, today, yesterday, tomorrow ) r.*
from readings r
order by date, today, yesterday, tomorrow, creationtime desc;
下面的代码按“创建时间”删除所有重复的“源”行
delete from readings r1
where exists(
select * from readings r2
where r1.creationtime > r2.creationtime
and r1.source = r2.source
)
order by r1.creationtime;
好像很简单
select distinct on ("date", source) *
from readings
order by "date", source, creationtime desc;
上面写着“每天每个来源只选择一个(最新的)阅读”。