Postgres 从 table 中删除记录，保持最小值和最大值

Question

我有一个 Postgres table 像这样：

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-28 2:00  |156      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-20 8:00  |156      |AR   |
|34678 |2020-07-22 1:00  |200      |MS   |

基本上我想删除每一行，这样每个扫描 ID 只保留 2 行。我想以最短时间和最长时间保留扫描 ID。

目前的工作流程是每天将数据汇总并写入此table，因此写入后可能会有一堆新的扫描事件，但我只想保留最大值和最小值。我该怎么做？

编辑：期望的结果 table 看起来像这样

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-22 1:00  |200      |MS   |

Answer 1

您可以使用 using:

delete from t
    using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
           from t
           group by scanid
          ) tt
    where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);

您也可以这样表述：

delete from t
    where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
          scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;

Answer 2

一条记录在中间，如果它上面（至少）有一个，下面（至少）有一个：

DELETE FROM ztable d
WHERE EXISTS ( SELECT *         
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime < d.scandatetime
        )
AND EXISTS ( SELECT *
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime > d.scandatetime
        );

类似的技巧，使用row_number():

DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
        , row_number() OVER 
                (PARTITION BY scanId ORDER BY scandatetime ASC) rn
        , row_number() OVER
                (PARTITION BY scanId ORDER BY scandatetime DESC) rrn
        FROM  ztable 
        ) x
        WHERE x.scanId = d.scanId
        AND x.scandatetime = d.scandatetime
        AND x.rn <> 1 AND x.rrn <> 1

        ;

Answer 3

您可以将 NOT IN 与子项一起使用select:

delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
                                     from the_table
                                     group by scanid
                                     union all
                                     select scanid, max(scandatetime)
                                     from the_table
                                     group by scanid);

但我认为使用 exists 的解决方案会更快。

Postgres 从 table 中删除记录，保持最小值和最大值

Postgres Delete records from table keeping a minimum and maximum value

sql

postgresql

duplicates

sql-delete