MSSQL 获取时差低于 X 的行

MSSQL get rows with timedifference below X

我有一个 table 和一个 ID、一个 CHAR 和一个 DATETIME 字段。现在我想获取所有 DATEDIFF 不超过 5 分钟的行。


供参考的示例数据:

  ID2    CHA       Timer
   1       B      2018-03-06 11:31:39
   2       S      2018-03-06 11:33:39
   3       B      2018-03-06 11:39:39
   4       S      2018-03-06 11:45:39
   5       B      2018-03-06 11:46:39
   6       S      2018-03-06 11:47:39
   7       B      2018-03-06 11:48:39
   8       S      2018-03-06 11:50:39
   9       B      2018-03-06 11:51:39
   10      S      2018-03-06 11:59:39

期望的输出:

  ID2    CHA       Timer
   1       B      2018-03-06 11:31:39
   2       S      2018-03-06 11:33:39
   4       S      2018-03-06 11:45:39
   5       B      2018-03-06 11:46:39
   6       S      2018-03-06 11:47:39
   7       B      2018-03-06 11:48:39
   8       S      2018-03-06 11:50:39
   9       B      2018-03-06 11:51:39

我当前的查询是这样的:

select *
from t t1
inner join t t2
on t1.ID = t2.ID
where datediff(minute, t1.timer, t2.timer)<=5

遗憾的是,这个 returns 相同的条目多次。我认为这是因为 INNER JOIN,但我不能确定。

如何获得想要的结果?


Sqlfiddle自己测试一下。

好吧,你可以 CROSS JOIN 而不是 INNER JOIN

select *
from t t1
cross join t t2
where 
    ABS(datediff(minute, t1.timer, t2.timer))<=5
    AND t1.id < t2.id

这将给出所有可能的时间差小于 5 分钟的行对。

t1.id < t2.id 需要 return 每对中只有一个实例。

如果您对这样的成对不感兴趣,那么,您只需将成对的每一侧都放入一个列表中即可。 UNION 将删除重复项。

WITH
CTE_Pairs
AS
(
  select
    T1.id AS id1
    ,T1.cha AS cha1
    ,T1.timer AS timer1
    ,T2.id AS id2
    ,T2.cha AS cha2
    ,T2.timer AS timer2
  from t t1
  cross join t t2
  where 
      ABS(datediff(second, t1.timer, t2.timer)) <= 5*60
      AND t1.id < t2.id
)
SELECT 
  id1 AS id
  ,cha1 AS cha
  ,timer1 AS timer
FROM CTE_Pairs

UNION

SELECT 
  id2 AS id
  ,cha2 AS cha
  ,timer2 AS timer
FROM CTE_Pairs

ORDER BY id
;

更新

这是使用更新数据的新解决方案。

SELECT  *
  FROM  t AS t1
  WHERE EXISTS
      ( SELECT  1
          FROM  t AS t2
          WHERE ( t1.timer <= DATEADD( MINUTE, 5, t2.timer )
                  OR t1.timer >= DATEADD( MINUTE, -5, t2.timer ))
                AND t1.id <> t2.id)
;

这 returns 任何一行,另一行出现在它之前或之后的 5 分钟内。如果您运行此查询具有大量数据,这应该能够在 timer 列上使用索引。

旧的过时答案

你们非常亲密。您需要加入字符字段,除非 ID 匹配。

select t2.*
from t t1
inner join t t2
on t1.cha = t2.cha
and t1.id <> t2.id
where datediff(minute, t1.timer, t2.timer) <=5
order by t2.id;

这应该可以解决问题

select distinct t1.*
from t t1
inner join t t2
on t1.ID <> t2.ID
where datediff(minute, t1.timer, t2.timer) between -5 and 5

您可以使用 LEADLAG window 函数:

select id, cha, timer
from (
  select id, cha, timer,   
         COALESCE(datediff(minute,                   
                           lag(timer) over (order by id),
                           timer) 
                  , 10) prev_diff,
         COALESCE(datediff(minute, 
                           timer, 
                           lead(timer) over (order by id))
                  , 10) next_diff
   from t) as x
where prev_diff <= 5 or next_diff <= 5 

LEAD用于获取下一条记录的timer值,而LAG用于获取下一条记录的值上一个 记录。如果当前值与这两个值中的任何一个之间的差值等于或小于 5,则您匹配。

Demo here

更新:

如果 id 字段不能用于确定行顺序,那么您可以使用 ROW_NUMBER 生成的数字代替:

;with t_rn AS (
   select id, cha, timer,
          row_number() over (order by timer) as rn
   from t
)
select id, cha, timer
from (
   select id, cha, timer,   
          coalesce(datediff(minute,                   
                            lag(timer) over (order by rn),
                            timer) 
                   , 10) prev_diff,
          coalesce(datediff(minute, 
                            timer, 
                            lead(timer) over (order by rn))
                   , 10) next_diff
   from t_rn) as x
where  prev_diff <= 5 or next_diff <= 5 

Demo here

感谢@Vladimir,他可以看到我看不到的地方,上面的查询可以简化为:

select id, cha, timer
from (
   select id, cha, timer,   
          coalesce(datediff(minute,                   
                            lag(timer) over (order by timer),
                            timer) 
                   , 10) prev_diff,
          coalesce(datediff(minute, 
                            timer, 
                            lead(timer) over (order by timer))
                   , 10) next_diff
   from t_rn) as x
where  prev_diff <= 5 or next_diff <= 5