根据它们之间的间隔对时间戳进行分组
Grouping Timestamps based on the interval between them
我在 Hive (SQL) 中有一个 table,其中有一堆时间戳,需要对其进行分组,以便根据时间戳之间的时间差创建单独的会话。
示例:
考虑以下时间戳(为简单起见,在 HH:MM 中给出):
9.00
9.10
9.20
9.40
9.43
10.30
10.45
11.25
12.30
12.33
等等..
所以现在,所有落在下一个时间戳 30 分钟内的时间戳都属于同一个会话,
即 9.00、9.10、9.20、9.40、9.43 形式 1 会话。
但由于 9.43 和 10.30 之间的差异超过 30 分钟,时间戳 10.30 属于不同的会话。同样,10.30 和 10.45 属于一个时段。
创建这些会话后,我们必须获取该会话的最小时间戳和最大时间戳。
我尝试用它的 LEAD 减去当前时间戳,并在它大于 30 分钟时放置一个标志,但我很难做到这一点。
非常感谢你们的任何建议。如果问题不够清楚,请告诉我。
此示例数据的预期输出:
Session_start Session_end
9.00 9.43
10.30 10.45
11.25 11.25 (same because the next time is not within 30 mins)
12.30 12.33
希望对您有所帮助。
试试这个:
SELECT DATE_FORMAT(MIN(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_start,
DATE_FORMAT(MAX(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_end
FROM tableA A
LEFT JOIN ( SELECT A.column1, diff, IF(@diff:=diff < 30, @id, @id:=@id+1) AS rnk
FROM (SELECT B.column1, TIME_TO_SEC(TIMEDIFF(STR_TO_DATE(B.column1, '%H.%i'), STR_TO_DATE(A.column1, '%H.%i'))) / 60 AS diff
FROM tableA A
INNER JOIN tableA B ON STR_TO_DATE(A.column1, '%H.%i') < STR_TO_DATE(B.column1, '%H.%i')
GROUP BY STR_TO_DATE(A.column1, '%H.%i')
) AS A, (SELECT @diff:=0, @id:= 1) AS B
) AS B ON A.column1 = B.column1
GROUP BY IFNULL(B.rnk, 1);
输出
| SESSION_START | SESSION_END |
|---------------|-------------|
| 9.00 | 9.43 |
| 10.30 | 10.45 |
| 11.25 | 11.25 |
| 12.30 | 12.33 |
试试这个..
SELECT MIN(session_time_tmp) session_start, MAX(session_time_tmp) session_end FROM
(
SELECT IF((TIME_TO_SEC(TIMEDIFF(your_time_field, COALESCE(@previousValue, your_time_field))) / 60) > 30 ,
@sessionCount := @sessionCount + 1, @sessionCount ) sessCount,
( @previousValue := your_time_field ) session_time_tmp FROM
(
SELECT your_time_field, @previousValue:= NULL, @sessionCount := 1 FROM yourtable ORDER BY your_time_field
) a
) b
GROUP BY sessCount
只需替换 yourtable 和 your_time_field
由于 MySQL 缺少 LAG 和 LEAD 功能,获取上一条或下一条记录已经是一些工作了。方法如下:
select
thetime,
(select max(thetime) from mytable afore where afore.thetime < mytable.thetime) as afore_time,
(select min(thetime) from mytable after where after.thetime > mytable.thetime) as after_time
from mytable;
基于此,我们可以构建整个查询,寻找间隙(即与上一条或下一条记录的时间差超过 30 分钟 = 1800 秒)。
select
startrec.thetime as start_time,
(
select min(endrec.thetime)
from
(
select
thetime,
coalesce(time_to_sec(timediff((select min(thetime) from mytable after where after.thetime > mytable.thetime), thetime)), 1801) > 1800 as gap
from mytable
) endrec
where gap
and endrec.thetime >= startrec.thetime
) as end_time
from
(
select
thetime,
coalesce(time_to_sec(timediff(thetime, (select max(thetime) from mytable afore where afore.thetime < mytable.thetime))), 1801) > 1800 as gap
from mytable
) startrec
where gap;
SQL fiddle: http://www.sqlfiddle.com/#!2/d307b/20.
所以它不是 MySQL,而是 Hive。我不知道 Hive,但如果它支持 LAG,就像你说的,试试这个 PostgreSQL 查询。您可能需要更改时差计算,这通常与一个 dbms 不同。
select min(thetime) as start_time, max(thetime) as end_time
from
(
select thetime, count(gap) over (rows between unbounded preceding and current row) as groupid
from
(
select thetime, case when thetime - lag(thetime) over (order by thetime) > interval '30 minutes' then 1 end as gap
from mytable
) times
) groups
group by groupid
order by min(thetime);
查询找到差距,然后使用 运行 总差距计数来构建组 ID,剩下的就是聚合。
SQL fiddle: http://www.sqlfiddle.com/#!17/8bc4a/6.
我在 Hive (SQL) 中有一个 table,其中有一堆时间戳,需要对其进行分组,以便根据时间戳之间的时间差创建单独的会话。
示例:
考虑以下时间戳(为简单起见,在 HH:MM 中给出):
9.00
9.10
9.20
9.40
9.43
10.30
10.45
11.25
12.30
12.33
等等..
所以现在,所有落在下一个时间戳 30 分钟内的时间戳都属于同一个会话, 即 9.00、9.10、9.20、9.40、9.43 形式 1 会话。
但由于 9.43 和 10.30 之间的差异超过 30 分钟,时间戳 10.30 属于不同的会话。同样,10.30 和 10.45 属于一个时段。
创建这些会话后,我们必须获取该会话的最小时间戳和最大时间戳。
我尝试用它的 LEAD 减去当前时间戳,并在它大于 30 分钟时放置一个标志,但我很难做到这一点。
非常感谢你们的任何建议。如果问题不够清楚,请告诉我。
此示例数据的预期输出:
Session_start Session_end
9.00 9.43
10.30 10.45
11.25 11.25 (same because the next time is not within 30 mins)
12.30 12.33
希望对您有所帮助。
试试这个:
SELECT DATE_FORMAT(MIN(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_start,
DATE_FORMAT(MAX(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_end
FROM tableA A
LEFT JOIN ( SELECT A.column1, diff, IF(@diff:=diff < 30, @id, @id:=@id+1) AS rnk
FROM (SELECT B.column1, TIME_TO_SEC(TIMEDIFF(STR_TO_DATE(B.column1, '%H.%i'), STR_TO_DATE(A.column1, '%H.%i'))) / 60 AS diff
FROM tableA A
INNER JOIN tableA B ON STR_TO_DATE(A.column1, '%H.%i') < STR_TO_DATE(B.column1, '%H.%i')
GROUP BY STR_TO_DATE(A.column1, '%H.%i')
) AS A, (SELECT @diff:=0, @id:= 1) AS B
) AS B ON A.column1 = B.column1
GROUP BY IFNULL(B.rnk, 1);
输出
| SESSION_START | SESSION_END |
|---------------|-------------|
| 9.00 | 9.43 |
| 10.30 | 10.45 |
| 11.25 | 11.25 |
| 12.30 | 12.33 |
试试这个..
SELECT MIN(session_time_tmp) session_start, MAX(session_time_tmp) session_end FROM
(
SELECT IF((TIME_TO_SEC(TIMEDIFF(your_time_field, COALESCE(@previousValue, your_time_field))) / 60) > 30 ,
@sessionCount := @sessionCount + 1, @sessionCount ) sessCount,
( @previousValue := your_time_field ) session_time_tmp FROM
(
SELECT your_time_field, @previousValue:= NULL, @sessionCount := 1 FROM yourtable ORDER BY your_time_field
) a
) b
GROUP BY sessCount
只需替换 yourtable 和 your_time_field
由于 MySQL 缺少 LAG 和 LEAD 功能,获取上一条或下一条记录已经是一些工作了。方法如下:
select
thetime,
(select max(thetime) from mytable afore where afore.thetime < mytable.thetime) as afore_time,
(select min(thetime) from mytable after where after.thetime > mytable.thetime) as after_time
from mytable;
基于此,我们可以构建整个查询,寻找间隙(即与上一条或下一条记录的时间差超过 30 分钟 = 1800 秒)。
select
startrec.thetime as start_time,
(
select min(endrec.thetime)
from
(
select
thetime,
coalesce(time_to_sec(timediff((select min(thetime) from mytable after where after.thetime > mytable.thetime), thetime)), 1801) > 1800 as gap
from mytable
) endrec
where gap
and endrec.thetime >= startrec.thetime
) as end_time
from
(
select
thetime,
coalesce(time_to_sec(timediff(thetime, (select max(thetime) from mytable afore where afore.thetime < mytable.thetime))), 1801) > 1800 as gap
from mytable
) startrec
where gap;
SQL fiddle: http://www.sqlfiddle.com/#!2/d307b/20.
所以它不是 MySQL,而是 Hive。我不知道 Hive,但如果它支持 LAG,就像你说的,试试这个 PostgreSQL 查询。您可能需要更改时差计算,这通常与一个 dbms 不同。
select min(thetime) as start_time, max(thetime) as end_time
from
(
select thetime, count(gap) over (rows between unbounded preceding and current row) as groupid
from
(
select thetime, case when thetime - lag(thetime) over (order by thetime) > interval '30 minutes' then 1 end as gap
from mytable
) times
) groups
group by groupid
order by min(thetime);
查询找到差距,然后使用 运行 总差距计数来构建组 ID,剩下的就是聚合。
SQL fiddle: http://www.sqlfiddle.com/#!17/8bc4a/6.