Oracle sql:过滤仅相差极小时间的重复行

Oracle sql: filtering repeated rows that only differ by a tiny amount of time

我有一个带有事件警报的 Oracle table,并且由于对我来说奇怪且未知的条件警报有时会重复出现,所以我被要求创建一个服务来从该 Oracle 中删除重复的警报table.

将警报(table 中的一行)视为重复的条件是存在另一个完全相同的 PKN_EVENTNAME 并且 RECEIVEDDATE 仅与前一个略有不同时间量(比如 10 秒 - 向上或向下 -)。

我首先要做的是创建一个 Oracle sql 语句,该语句将通过 PKN_EVENTNAME 将所有警报分组,在每个组中分隔重复的警报(以便以后删除)。

我想我在路上,但我被困住了。

¿有什么帮助吗?

我的 sql 到目前为止:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id and                                              -- Not the same row
             trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)                 -- Same date
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10)  -- < 10 sec

编辑 1:

通过@Tejash 更正,我在 Visual Studio Oracle SQL 浏览器中看到了不同的结果,但我无法理解它们。不清楚结果是已经记录要删除(重复报警)还是什么。

您的 exists 条件中缺少 t1.PKN_EVENTNAME = t2.PKN_EVENTNAME,并且您的 exists 子句中有一个不相关的条件。

您的查询应如下所示:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id   -- Not the same row                                         
             --trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)   -- this is not needed
             and t1.PKN_EVENTNAME = t2.PKN_EVENTNAME -- added this
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10) -- < 5 sec

你当然可以使用 exists 来写这个。然而,使用解析函数可能更有效。像这样

with alarms as (
  select pa.*,
         lag(pa.RECEIVEDDATE) over (partition by pa.pkn_eventName
                                        order by pa.recievedDate) prior_receivedDate
    from PARQUIMETERS_ALARMS pa
)
select *
  from alarms
 where receivedDate - prior_receivedDate <= interval '10' second;

请注意,我在这里分解了 alarms 子查询,这样您就可以轻松地 运行 单独查看数据集,然后再应用 prior_receivedDate 数据查找重复行的过滤条件。这通常可用于调试/可视化数据。但是,如果这对您来说更容易,您可以自由地使用内联视图编写查询。

您可以利用 range 解析函数的子句:

with dups as (
  select t1.*
       , row_number() over (
           partition by PKN_EVENTNAME, RECEIVEDDATE
           order by id
         ) as dup
  from PARQUIMETERS_ALARMS t1
), nodups as (
  select * from dups where dup = 1
), t as (
  select nodups.ID, nodups.PKN_EVENTNAME, nodups.RECEIVEDDATE
       , count(*) over (
           partition by nodups.PKN_EVENTNAME
           order by nodups.RECEIVEDDATE
           range between interval '10' second preceding and current row
         ) as cnt
  from nodups
)
select * from t where cnt = 1

(更新:在注释中显示的 OP 之后添加了 CTE dupsnodups 有重复的元组 (PKN_EVENTNAME, RECEIVEDDATE)。)

解释:通过nodups CTE清理数据后,where条件只过滤最近10秒内只有一行的行(显然是当前行)。