Postgres 从 table 中删除记录,保持最小值和最大值
Postgres Delete records from table keeping a minimum and maximum value
我有一个 Postgres table 像这样:
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-28 2:00 |156 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-20 8:00 |156 |AR |
|34678 |2020-07-22 1:00 |200 |MS |
基本上我想删除每一行,这样每个扫描 ID 只保留 2 行。我想以最短时间和最长时间保留扫描 ID。
目前的工作流程是每天将数据汇总并写入此table,因此写入后可能会有一堆新的扫描事件,但我只想保留最大值和最小值。我该怎么做?
编辑:期望的结果 table 看起来像这样
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-22 1:00 |200 |MS |
您可以使用 using
:
delete from t
using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
from t
group by scanid
) tt
where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);
您也可以这样表述:
delete from t
where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;
一条记录在中间,如果它上面(至少)有一个,下面(至少)有一个:
DELETE FROM ztable d
WHERE EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime < d.scandatetime
)
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime > d.scandatetime
);
类似的技巧,使用row_number()
:
DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime ASC) rn
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime DESC) rrn
FROM ztable
) x
WHERE x.scanId = d.scanId
AND x.scandatetime = d.scandatetime
AND x.rn <> 1 AND x.rrn <> 1
;
您可以将 NOT IN 与子项一起使用select:
delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
from the_table
group by scanid
union all
select scanid, max(scandatetime)
from the_table
group by scanid);
但我认为使用 exists
的解决方案会更快。
我有一个 Postgres table 像这样:
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-28 2:00 |156 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-20 8:00 |156 |AR |
|34678 |2020-07-22 1:00 |200 |MS |
基本上我想删除每一行,这样每个扫描 ID 只保留 2 行。我想以最短时间和最长时间保留扫描 ID。
目前的工作流程是每天将数据汇总并写入此table,因此写入后可能会有一堆新的扫描事件,但我只想保留最大值和最小值。我该怎么做?
编辑:期望的结果 table 看起来像这样
|scanID|scandatetime |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00 |123 |WA |
|12345 |2020-07-29 10:00 |200 |OR |
|34678 |2020-07-20 4:00 |123 |TX |
|34678 |2020-07-22 1:00 |200 |MS |
您可以使用 using
:
delete from t
using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
from t
group by scanid
) tt
where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);
您也可以这样表述:
delete from t
where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;
一条记录在中间,如果它上面(至少)有一个,下面(至少)有一个:
DELETE FROM ztable d
WHERE EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime < d.scandatetime
)
AND EXISTS ( SELECT *
FROM ztable x
WHERE x.scanId = d.scanId
AND x.scandatetime > d.scandatetime
);
类似的技巧,使用row_number()
:
DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime ASC) rn
, row_number() OVER
(PARTITION BY scanId ORDER BY scandatetime DESC) rrn
FROM ztable
) x
WHERE x.scanId = d.scanId
AND x.scandatetime = d.scandatetime
AND x.rn <> 1 AND x.rrn <> 1
;
您可以将 NOT IN 与子项一起使用select:
delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
from the_table
group by scanid
union all
select scanid, max(scandatetime)
from the_table
group by scanid);
但我认为使用 exists
的解决方案会更快。