MYSQL SUM 连续小时内的持续时间
MYSQL SUM durations within consecutive hours
使用旧版本的 MySQL,其中 WITH 子句无效。
以 table 开头:
+--------+---------------------+---------------------+
| person | start_time | end_time |
+--------+---------------------+---------------------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 |
| Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 |
| Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 |
| Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 |
+--------+---------------------+---------------------+
我想要一个查询,以对所有相差在一个小时内的人均持续时间求和。
+--------+---------------------+---------------------+----------+
| person | start_time | end_time | duration |
+--------+---------------------+---------------------+----------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | 3599 |
| Alice | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 | 8064 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 899 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 | 5806 |
+--------+---------------------+---------------------+----------+
将提供此类结果集的示例查询是:
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)
BETWEEN 0 AND 3599
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
与
相同
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)
BETWEEN 0 AND 3599
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
检查此 SQL Fiddle
中的两个查询
编辑
根据@Strawberry 的评论,需要重新编写上述查询并稍作更改。
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
与
相同
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
检查此 SQL Fiddle
中的两个查询
例如- 虽然如所写,此解决方案专用于 8.0 之前的 MySQL 版本...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(person VARCHAR(12) NOT NULL
,start_time DATETIME NOT NULL
,end_time DATETIME NOT NULL
,PRIMARY KEY(person,start_time)
);
INSERT INTO my_table VALUES
('Alice','2020-02-27 20:00:00','2020-02-27 20:59:59'),
('Alice','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Alice','2020-02-28 00:00:00','2020-02-28 00:59:59'),
('Alice','2020-02-28 01:00:00','2020-02-28 01:59:59'),
('Bob','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Cindy','2020-02-28 02:00:00','2020-02-28 02:59:59'),
('Cindy','2020-02-28 03:00:00','2020-02-28 03:36:59');
SELECT person
, MIN(start_time) start_time
, MAX(end_time) end_time
, SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta
FROM
( SELECT x.*
, CASE WHEN person = @prev_person
THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR
THEN @i:=@i
ELSE @i:=@i+1 END
ELSE @i:=1 END i
, @prev_person := person
, @prev_end_time := end_time
FROM my_table x
, (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars
ORDER
BY person
, start_time
) a
GROUP
BY person,i;
+--------+---------------------+---------------------+-------+
| person | start_time | end_time | delta |
+--------+---------------------+---------------------+-------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | 3599 |
| Alice | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 | 8097 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 899 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 | 5818 |
+--------+---------------------+---------------------+-------+
FWIW,我认为以这种方式重写查询会呈现它 'version agnostic',即不受公平指责的影响,即不能保证元素的评估顺序 - 但我可能是错的。无论如何,在 MySQL 8.0+ 中,可以使用该版本提供的扩展功能重写以下内容。
SELECT person
, MIN(start_time) start_time
, MAX(end_time) end_time
, SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta
FROM
( SELECT * FROM
( SELECT x.*
, CASE WHEN person = @prev_person
THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR
THEN @i:=@i
ELSE @i:=@i+1 END
ELSE @i:=1 END i
, @prev_person := person
, @prev_end_time := end_time
FROM my_table x
, (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars
) k
ORDER
BY person
, start_time
) a
GROUP
BY person,i;
尝试单个查询对我来说并不容易,但我用自己 LEFT JOIN
table 和 ON
子句
中的一堆条件做到了
SELECT A.Person,
MIN(A.start_time) AS start_time,
MAX(A.end_time) AS end_time,
TIME_TO_SEC(TIMEDIFF(MAX(A.end_time),MIN(A.start_time))) Duration,
CASE WHEN B.person IS NULL THEN 0 ELSE 1 END AS chk
FROM my_table A
LEFT JOIN my_table B
ON A.person=B.person
AND A.start_time - INTERVAL 1 HOUR < B.end_time -- when A.start_time minus 1 hour is smaller than B.end_time
AND A.end_time + INTERVAL 1 HOUR > B.start_time -- when A.end_time plus 1 hour is bigger than B.start_time
AND A.start_time <> B.start_time -- when A.start_time is not same as B.start_time
GROUP BY A.person,chk;
基本查询是这样的:
SELECT *,CASE WHEN b.person IS NULL THEN 0 ELSE 1 END AS chk
FROM my_table a LEFT JOIN my_table b
ON a.person=b.person
AND a.start_time - INTERVAL 1 HOUR < b.end_time
AND a.end_time + INTERVAL 1 HOUR > b.start_time
AND a.start_time <> b.start_time;
其中 return 结果如下:
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| person | start_time | end_time | person | start_time | end_time | chk |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | NULL | NULL | NULL | 0 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 1 |
| Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1 |
| Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | 1 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | NULL | NULL | NULL | 0 |
| Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | 1 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | 1 |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
P/S:感谢 Strawberry 提供的 table 结构和数据示例。
编辑:
在 Strawberry 的评论之后,我同意我之前的查询实际上并没有计算正确的持续时间,因为我只是在 MAX(end_date)
和 MIN(start_date)
之间取 TIMEDIFF
。我做了一些更改,更新后的查询如下:
SELECT person,
MIN(CASE WHEN starttime=0 THEN start_time ELSE starttime END) AS starttime,
MAX(CASE WHEN endtime=0 THEN end_time ELSE endtime END) AS endtime,
SUM(duration) AS duration,
CASE WHEN starttime=0 THEN 0 ELSE 1 END AS chk
FROM
(SELECT a.person, a.start_time,a.end_time,
ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0
WHEN b.start_time IS NULL THEN a.start_time
ELSE a.start_time END) starttime,
ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0
WHEN b.start_time IS NULL THEN a.end_time
ELSE a.end_time END) endtime,
TIME_TO_SEC(TIMEDIFF(a.end_time,a.start_time)) duration
FROM my_table a
LEFT JOIN my_table b ON a.person=b.person AND b.start_time > a.end_time
GROUP BY a.person,a.start_time,a.end_time) TT
GROUP BY person,chk;
这里是fiddle:https://www.db-fiddle.com/f/8XHWhfhCYSj8zcFcmo2KUo/1
P/S:为了测试目的,我在 fiddle 中添加了另一个 'Bob' 记录。
和上一个有点相似只是这次我把大部分ON
条件移动到SELECT
。我还使用 ANY_VALUE
来绕过 sql_mode=only_full_group_by
。另一方面,如果 sql_mode
关闭,则不需要 ANY_VALUE()
。请注意,如果您使用的是 MariaDB,它不支持 ANY_VALUE()
.
使用旧版本的 MySQL,其中 WITH 子句无效。
以 table 开头:
+--------+---------------------+---------------------+
| person | start_time | end_time |
+--------+---------------------+---------------------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 |
| Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 |
| Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 |
| Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 |
+--------+---------------------+---------------------+
我想要一个查询,以对所有相差在一个小时内的人均持续时间求和。
+--------+---------------------+---------------------+----------+
| person | start_time | end_time | duration |
+--------+---------------------+---------------------+----------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | 3599 |
| Alice | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 | 8064 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 899 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 | 5806 |
+--------+---------------------+---------------------+----------+
将提供此类结果集的示例查询是:
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)
BETWEEN 0 AND 3599
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
与
相同SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
WHERE TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)
BETWEEN 0 AND 3599
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
检查此 SQL Fiddle
中的两个查询编辑
根据@Strawberry 的评论,需要重新编写上述查询并稍作更改。
SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t.start_time,t1.end_time)=-1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
与
相同SELECT t.person,t.start_time,t.end_time,
SUM(TIMESTAMPDIFF(SECOND,t.start_time,t.end_time)) AS duration,
IF( EXISTS (SELECT * FROM test t1
WHERE t1.start_time=TIMESTAMPADD(SECOND,1,t.end_time)
OR TIMESTAMPDIFF(SECOND,t1.end_time,t.start_time)=1),1,0) AS continuous
FROM test t
GROUP BY t.person,continuous
ORDER BY t.person,t.start_time;
检查此 SQL Fiddle
中的两个查询例如- 虽然如所写,此解决方案专用于 8.0 之前的 MySQL 版本...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(person VARCHAR(12) NOT NULL
,start_time DATETIME NOT NULL
,end_time DATETIME NOT NULL
,PRIMARY KEY(person,start_time)
);
INSERT INTO my_table VALUES
('Alice','2020-02-27 20:00:00','2020-02-27 20:59:59'),
('Alice','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Alice','2020-02-28 00:00:00','2020-02-28 00:59:59'),
('Alice','2020-02-28 01:00:00','2020-02-28 01:59:59'),
('Bob','2020-02-27 23:45:00','2020-02-27 23:59:59'),
('Cindy','2020-02-28 02:00:00','2020-02-28 02:59:59'),
('Cindy','2020-02-28 03:00:00','2020-02-28 03:36:59');
SELECT person
, MIN(start_time) start_time
, MAX(end_time) end_time
, SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta
FROM
( SELECT x.*
, CASE WHEN person = @prev_person
THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR
THEN @i:=@i
ELSE @i:=@i+1 END
ELSE @i:=1 END i
, @prev_person := person
, @prev_end_time := end_time
FROM my_table x
, (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars
ORDER
BY person
, start_time
) a
GROUP
BY person,i;
+--------+---------------------+---------------------+-------+
| person | start_time | end_time | delta |
+--------+---------------------+---------------------+-------+
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | 3599 |
| Alice | 2020-02-27 23:45:00 | 2020-02-28 01:59:59 | 8097 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 899 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 03:36:59 | 5818 |
+--------+---------------------+---------------------+-------+
FWIW,我认为以这种方式重写查询会呈现它 'version agnostic',即不受公平指责的影响,即不能保证元素的评估顺序 - 但我可能是错的。无论如何,在 MySQL 8.0+ 中,可以使用该版本提供的扩展功能重写以下内容。
SELECT person
, MIN(start_time) start_time
, MAX(end_time) end_time
, SUM(TIME_TO_SEC(TIMEDIFF(end_time,start_time))) delta
FROM
( SELECT * FROM
( SELECT x.*
, CASE WHEN person = @prev_person
THEN CASE WHEN start_time <= @prev_end_time + INTERVAL 1 HOUR
THEN @i:=@i
ELSE @i:=@i+1 END
ELSE @i:=1 END i
, @prev_person := person
, @prev_end_time := end_time
FROM my_table x
, (SELECT @prev_person := null, @prev_end_time := null, @i:=0) vars
) k
ORDER
BY person
, start_time
) a
GROUP
BY person,i;
尝试单个查询对我来说并不容易,但我用自己 LEFT JOIN
table 和 ON
子句
SELECT A.Person,
MIN(A.start_time) AS start_time,
MAX(A.end_time) AS end_time,
TIME_TO_SEC(TIMEDIFF(MAX(A.end_time),MIN(A.start_time))) Duration,
CASE WHEN B.person IS NULL THEN 0 ELSE 1 END AS chk
FROM my_table A
LEFT JOIN my_table B
ON A.person=B.person
AND A.start_time - INTERVAL 1 HOUR < B.end_time -- when A.start_time minus 1 hour is smaller than B.end_time
AND A.end_time + INTERVAL 1 HOUR > B.start_time -- when A.end_time plus 1 hour is bigger than B.start_time
AND A.start_time <> B.start_time -- when A.start_time is not same as B.start_time
GROUP BY A.person,chk;
基本查询是这样的:
SELECT *,CASE WHEN b.person IS NULL THEN 0 ELSE 1 END AS chk
FROM my_table a LEFT JOIN my_table b
ON a.person=b.person
AND a.start_time - INTERVAL 1 HOUR < b.end_time
AND a.end_time + INTERVAL 1 HOUR > b.start_time
AND a.start_time <> b.start_time;
其中 return 结果如下:
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| person | start_time | end_time | person | start_time | end_time | chk |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
| Alice | 2020-02-27 20:00:00 | 2020-02-27 20:59:59 | NULL | NULL | NULL | 0 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | 1 |
| Alice | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1 |
| Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | 1 |
| Alice | 2020-02-28 00:00:00 | 2020-02-28 00:59:59 | Alice | 2020-02-28 01:00:00 | 2020-02-28 01:59:59 | 1 |
| Bob | 2020-02-27 23:45:00 | 2020-02-27 23:59:59 | NULL | NULL | NULL | 0 |
| Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | 1 |
| Cindy | 2020-02-28 02:00:00 | 2020-02-28 02:59:59 | Cindy | 2020-02-28 03:00:00 | 2020-02-28 03:36:59 | 1 |
+ ------ + ------------------- + ------------------- + ------ + ------------------- + ------------------- + --- +
P/S:感谢 Strawberry 提供的 table 结构和数据示例。
编辑:
在 Strawberry 的评论之后,我同意我之前的查询实际上并没有计算正确的持续时间,因为我只是在 MAX(end_date)
和 MIN(start_date)
之间取 TIMEDIFF
。我做了一些更改,更新后的查询如下:
SELECT person,
MIN(CASE WHEN starttime=0 THEN start_time ELSE starttime END) AS starttime,
MAX(CASE WHEN endtime=0 THEN end_time ELSE endtime END) AS endtime,
SUM(duration) AS duration,
CASE WHEN starttime=0 THEN 0 ELSE 1 END AS chk
FROM
(SELECT a.person, a.start_time,a.end_time,
ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0
WHEN b.start_time IS NULL THEN a.start_time
ELSE a.start_time END) starttime,
ANY_VALUE(CASE WHEN b.start_time > a.end_time + INTERVAL 1 HOUR THEN 0
WHEN b.start_time IS NULL THEN a.end_time
ELSE a.end_time END) endtime,
TIME_TO_SEC(TIMEDIFF(a.end_time,a.start_time)) duration
FROM my_table a
LEFT JOIN my_table b ON a.person=b.person AND b.start_time > a.end_time
GROUP BY a.person,a.start_time,a.end_time) TT
GROUP BY person,chk;
这里是fiddle:https://www.db-fiddle.com/f/8XHWhfhCYSj8zcFcmo2KUo/1
P/S:为了测试目的,我在 fiddle 中添加了另一个 'Bob' 记录。
和上一个有点相似只是这次我把大部分ON
条件移动到SELECT
。我还使用 ANY_VALUE
来绕过 sql_mode=only_full_group_by
。另一方面,如果 sql_mode
关闭,则不需要 ANY_VALUE()
。请注意,如果您使用的是 MariaDB,它不支持 ANY_VALUE()
.