Select 具有重复数据的 table 中具有最小值的行

Question

我有一个 AWS Athena (presto) table 有一些行是重复的，除了一列，修改日期：

"feeder","circuitid","pole","phase","starttime","endtime","modifydate","eventduration"
"SOUTH","27802","1860981454636","C","2020-09-16 03:43:00.000","2020-09-16 03:49:00.000","2020-09-23 11:00:00.000","6"
"SOUTH","27802","1860981454636","C","2020-09-16 03:43:00.000","2020-09-16 03:49:00.000","2020-09-16 03:49:00.000","6"

我需要 select 来自这个 table 的一组记录，带有基于开始时间的 select 子句，并且其中有重复的行（少于 2% table)，只有 select 具有最小（最旧）修改日期的行。在上面的例子中，第二行应该是 returned.

这是我所做的：

SELECT p.* FROM event_frames as p where 
starttime between date '2020-09-16' and date '2020-09-17' and
modifydate = 
(select MIN(p2.modifydate) from event_frames as p2 where 
 p.feeder = p2.feeder and
 p.phase = p2.phase and
 p.circuitid = p2.circuitid  and 
 p.polenumbers = p2.polenumbers and
 p.starttime = p2.starttime and
 p.endtime = p2.endtime and
p.eventduration = p2.eventduration)

这有效，但仅适用于根据我上面的描述重复的行。我需要使用 MIN(modifydate)

return 单行和重复行

这可能吗？

Answer 1

您可以使用聚合：

select feeder, phase, circuitid, polenumbers, starttime, endtime, eventduration,
       min(modifydate) as modifydate
from event_frames
group by feeder, phase, circuitid, polenumbers, starttime, endtime, eventduration;

Select 具有重复数据的 table 中具有最小值的行

Select the row with minimum value in a table that has duplicate data

sql

presto

amazon-athena