查找连续值的范围

Finding ranges of continous values

带有 MySQL table 和时间戳。

|Timestamps             |
|-----------------------|
|2021-08-01 14:00:00.000|
|2021-08-01 14:00:00.100|
|2021-08-01 14:00:00.200|
|2021-08-01 14:00:00.300|
|2021-08-01 14:00:00.600|
|2021-08-01 14:00:00.700|
|2021-08-01 14:00:00.800|
|2021-08-01 14:00:01.000|

我想获取连续数据的时间间隔。连续由频率定义,在这个例子中,它可以是 10Hz 期望的结果是

|Start                   | End                    |
|------------------------|------------------------|
|2021-08-01 14:00:00.000 | 2021-08-01 14:00:00.300|
|2021-08-01 14:00:00.600 | 2021-08-01 14:00:00.800|

我正在使用 MySQL 版本 5.7.35 并且不能使用 WITH 和其他功能。这可以快速完成吗? 有大约。现在每个 table 100000 个元素。

使用:

CREATE TABLE test_tbl (
              my_data timestamp(3)  
                       );

INSERT INTO test_tbl VALUES
('2021-08-01 14:00:00.000'),
('2021-08-01 14:00:00.100'),
('2021-08-01 14:00:00.200'),
('2021-08-01 14:00:00.300'),
('2021-08-01 14:00:00.600'),
('2021-08-01 14:00:00.700'),
('2021-08-01 14:00:00.800'),
('2021-08-01 14:00:01.000');


SELECT 
       MIN(my_data) start,
       MAX(my_data) end 
  FROM 
     ( SELECT *
            , CASE WHEN right(my_data,3) = @prev+100 THEN @i:=@i ELSE @i:=@i+100 END row_num
            , @prev:=right(my_data,3) prev
         FROM test_tbl
            , (SELECT @prev:= null,@i:=0) vars 
              ORDER BY row_num
     ) x 
 GROUP  BY row_num;
    

演示:https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/98

插入速度

经验法则——“如果每秒写入次数少于 100 次,请不要担心性能。” (你问的是每秒 10 次,对吗?)(有超过 100 次的技巧。)

非窗口解决方案(8.0 或 10.2 之前的版本)

-- 样本数据

CREATE TABLE gaps (
    ts timestamp(3),
    PRIMARY KEY(ts)    -- Some index on ts is important here
);

INSERT INTO gaps VALUES
('2021-08-01 14:00:00.000'),
('2021-08-01 14:00:00.100'),
('2021-08-01 14:00:00.200'),
('2021-08-01 14:00:00.300'),
('2021-08-01 14:00:00.600'),
('2021-08-01 14:00:00.700'),
('2021-08-01 14:00:00.800'),
('2021-08-01 14:00:01.000'),  -- By itself
('2021-08-01 14:00:01.300');

-- 添加序号(通过AUTO_INCREMENT); 警告:这取决于 auto_increment_increment = 1

DROP TEMPORARY TABLE IF EXISTS gaps2;
CREATE TEMPORARY TABLE gaps2 (
    id INT AUTO_INCREMENT,
    PRIMARY KEY(id) )
SELECT  val
    FROM (
        ( SELECT a.ts as val
            FROM gaps AS a
            LEFT JOIN gaps AS b  ON a.ts = b.ts + interval 0.1 second
            WHERE b.ts IS NULL
        )
        UNION ALL
        ( SELECT  b.ts as val
            FROM gaps AS b
            LEFT JOIN gaps AS a  ON b.ts = a.ts - interval 0.1 second
            WHERE a.ts IS NULL
        )
        ORDER BY val
         ) AS cd;
         
select * from gaps2;
+----+-------------------------+
| id | val                     |
+----+-------------------------+
|  1 | 2021-08-01 14:00:00.000 |
|  2 | 2021-08-01 14:00:00.300 |
|  3 | 2021-08-01 14:00:00.600 |
|  4 | 2021-08-01 14:00:00.800 |
|  5 | 2021-08-01 14:00:01.000 |
|  6 | 2021-08-01 14:00:01.000 |
|  7 | 2021-08-01 14:00:01.300 |
|  8 | 2021-08-01 14:00:01.300 |
+----+-------------------------+

-- 枢轴:

SELECT  CEIL(id/2) AS pair,
        MIN(val) AS 'run_start',
        MAX(val) AS 'run_end'
    FROM gaps2
    GROUP BY pair
    ORDER BY pair;

+------+-------------------------+-------------------------+
| pair | run_start               | run_end                 |
+------+-------------------------+-------------------------+
|    1 | 2021-08-01 14:00:00.000 | 2021-08-01 14:00:00.300 |
|    2 | 2021-08-01 14:00:00.600 | 2021-08-01 14:00:00.800 |
|    3 | 2021-08-01 14:00:01.000 | 2021-08-01 14:00:01.000 |
|    4 | 2021-08-01 14:00:01.300 | 2021-08-01 14:00:01.300 |
+------+-------------------------+-------------------------+

得到第一个答案后,我开始修改它,昨晚我得到了一个工作示例。

SELECT 
       MIN(my_data) start,
       MAX(my_data) end 
  FROM 
     ( SELECT *
            , CASE WHEN UNIX_TIMESTAMP(my_data)*1000 >= @prev+90 and 
            UNIX_TIMESTAMP(my_data)*1000 <= @prev+110
      THEN @i:=@i ELSE @i:=@i+100 END row_num
            , @prev:=UNIX_TIMESTAMP(my_data)*1000 prev
         FROM test_tbl
            , (SELECT @prev:= null,@i:=0) vars 
              ORDER BY row_num
     ) x 
 GROUP  BY row_num;
    
    

我会标记第一个答案,因为它给了灵感。但感谢您的输入。