SQL (POSTGRESQL) 仅根据某些列删除重复值,根据每个重复集保留较新的值

SQL (POSTGRESQL) drop duplicate values based on certain columns only, keep newer value based on each duplicate set

我有以下 SQL table 称为 readings

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
2021-02-15      110       0.5         0        2021-02-15 15:59:46.383677    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
2021-02-15      110       0.7         1        2021-02-15 15:59:48.060236    x20
....
2021-02-22      110       0.5         1        2021-02-15 16:01:16.826577    x55
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

每天有65篇阅读。 从 x1、x2、x3 到 x65 的读数。

所以我在某些日子里发现了重复的读数。

有时读数不一样,所以我想保留当天较新的读数,即使它是第二天才记录的。

我想删除重复的值,我想保留较新的创建时间。所以我希望我的 table 最终看起来像这样。

date        |  today  | yesterday | tomorrow | creationtime               | source
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x1
2021-01-01      110       0.5         0        2021-01-01 12:42:17....       x2
2021-01-01      150       0.9         1        2021-01-01 12:55:17....       x3
....
2021-02-15      110       0.3         1        2021-02-15 12:42:17....       x1
2021-02-15      110       0.1         1        2021-02-15 12:42:17....       x2
2021-02-15      150       0.9         1        2021-02-15 12:55:17....       x3
...
2021-02-15      110       0.5         0        2021-02-16 16:06:04.008673    x17
....
2021-02-15      700       0.7         1        2021-02-16 16:04:02.267478    x20
....
2021-02-22      110       0.5         1        2021-02-16 16:09:17.524436    x55

我试过了

create table new_readings as select distinct c.* from readings c;

但它只是创建了 table 的副本并删除了完全不同的值。

您可以使用 distinct on:

select distinct on (date, today, yesterday, tomorrow ) r.*
from readings r
order by date, today, yesterday, tomorrow, creationtime desc;

下面的代码按“创建时间”删除所有重复的“源”行

delete from readings r1
    where exists(
        select * from readings r2
        where r1.creationtime > r2.creationtime
        and r1.source = r2.source
    )
order by r1.creationtime;

好像很简单

select distinct on ("date", source) *
from readings
order by "date", source, creationtime desc;

上面写着“每天每个来源只选择一个(最新的)阅读”。