如何将时间戳分组为孤岛(基于任意间隙)?
How to group timestamps into islands (based on arbitrary gap)?
将此日期列表视为 timestamptz
:
我使用颜色手动将日期分组:每组与下一组之间至少间隔 2 分钟。
我正在尝试衡量给定用户学习了多少,方法是查看他们执行动作的时间(数据是他们完成学习句子的时间。)例如:在黄色方块上,我会考虑用户一次学习,从 14:24 到 14:27,或者大约连续 3 分钟。
我了解如何通过遍历所有日期并查找两行之间的间隔来使用编程语言对这些日期进行分组。
我的问题是:如何使用 Postgres 以这种方式对日期进行分组?
(在 Google 或 SO 上寻找 'gaps' 会带来太多不相关的结果;我想我缺少我在这里尝试做的词汇。)
这样做就可以了:
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS grp
FROM (
SELECT done
, (lag(done) OVER (ORDER BY done) <= done - interval '2 min') AS step
FROM tbl
) sub
ORDER BY done;
如果前一行距离至少 2 分钟,则子查询 sub
将 step
记录为 true
- 在这种情况下按时间戳列 done
本身排序。
外部查询添加了步数的滚动计数,实际上是组号 (grp
) - 将聚合 FILTER
子句与另一个 window 函数相结合。
db<>fiddle here
相关:
- Query to find all timestamps more than a certain interval apart
- Select longest continuous sequence
- Grouping or Window
关于聚合 FILTER
子句:
- How can I simplify this game statistics query?
- Conditional lead/lag function PostgreSQL?
在 Erwin 的回答的基础上,这里是完整的查询,用于统计人们在这些问题上花费的时间 sessions/islands:
我的数据只显示人们完成评论的时间,而不是他们开始评论的时间,这意味着我们不知道会话真正开始的时间;有些岛屿只有一个时间戳(导致估计持续时间为 0。)我通过计算平均审查时间并将其添加到岛屿的总持续时间来解释这两者。
这对我的用例来说可能非常特殊,但我在这个过程中学到了一两件事,所以也许这会对以后的人有所帮助。
-- Returns estimated total study time and average time per review, both in seconds
SELECT (EXTRACT( EPOCH FROM logged) + countofislands * avgreviewtime) as totalstudytime, avgreviewtime -- add total logged time to estimate for first-review-in-island and 1-review islands
FROM
(
SELECT -- get the three key values that will let us calculate total time spent
sum(duration) as logged
, count(island) as countofislands
, EXTRACT( EPOCH FROM sum(duration) FILTER (WHERE duration != '00:00:00'::interval) )/( sum(reviews) FILTER (WHERE duration != '00:00:00'::interval) - count(reviews) FILTER (WHERE duration != '00:00:00'::interval)) as avgreviewtime
FROM
(
SELECT island, age( max(done), min(done) ) as duration, count(island) as reviews -- calculate the duration of islands
FROM
(
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS island -- give a unique number to each island
FROM (
SELECT -- detect the beginning of islands
done,
(
lag(done) OVER (ORDER BY done) <= done - interval '2 min'
) AS step
FROM review
WHERE clicker_id = 71 AND "done" > '2015-05-13' AND "done" < '2015-05-13 15:00:00' -- keep the queries small and fast for now
) sub
ORDER BY done
) grouped
GROUP BY island
) sessions
) summary
将此日期列表视为 timestamptz
:
我使用颜色手动将日期分组:每组与下一组之间至少间隔 2 分钟。
我正在尝试衡量给定用户学习了多少,方法是查看他们执行动作的时间(数据是他们完成学习句子的时间。)例如:在黄色方块上,我会考虑用户一次学习,从 14:24 到 14:27,或者大约连续 3 分钟。
我了解如何通过遍历所有日期并查找两行之间的间隔来使用编程语言对这些日期进行分组。
我的问题是:如何使用 Postgres 以这种方式对日期进行分组?
(在 Google 或 SO 上寻找 'gaps' 会带来太多不相关的结果;我想我缺少我在这里尝试做的词汇。)
这样做就可以了:
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS grp
FROM (
SELECT done
, (lag(done) OVER (ORDER BY done) <= done - interval '2 min') AS step
FROM tbl
) sub
ORDER BY done;
如果前一行距离至少 2 分钟,则子查询 sub
将 step
记录为 true
- 在这种情况下按时间戳列 done
本身排序。
外部查询添加了步数的滚动计数,实际上是组号 (grp
) - 将聚合 FILTER
子句与另一个 window 函数相结合。
db<>fiddle here
相关:
- Query to find all timestamps more than a certain interval apart
- Select longest continuous sequence
- Grouping or Window
关于聚合 FILTER
子句:
- How can I simplify this game statistics query?
- Conditional lead/lag function PostgreSQL?
在 Erwin 的回答的基础上,这里是完整的查询,用于统计人们在这些问题上花费的时间 sessions/islands:
我的数据只显示人们完成评论的时间,而不是他们开始评论的时间,这意味着我们不知道会话真正开始的时间;有些岛屿只有一个时间戳(导致估计持续时间为 0。)我通过计算平均审查时间并将其添加到岛屿的总持续时间来解释这两者。
这对我的用例来说可能非常特殊,但我在这个过程中学到了一两件事,所以也许这会对以后的人有所帮助。
-- Returns estimated total study time and average time per review, both in seconds
SELECT (EXTRACT( EPOCH FROM logged) + countofislands * avgreviewtime) as totalstudytime, avgreviewtime -- add total logged time to estimate for first-review-in-island and 1-review islands
FROM
(
SELECT -- get the three key values that will let us calculate total time spent
sum(duration) as logged
, count(island) as countofislands
, EXTRACT( EPOCH FROM sum(duration) FILTER (WHERE duration != '00:00:00'::interval) )/( sum(reviews) FILTER (WHERE duration != '00:00:00'::interval) - count(reviews) FILTER (WHERE duration != '00:00:00'::interval)) as avgreviewtime
FROM
(
SELECT island, age( max(done), min(done) ) as duration, count(island) as reviews -- calculate the duration of islands
FROM
(
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS island -- give a unique number to each island
FROM (
SELECT -- detect the beginning of islands
done,
(
lag(done) OVER (ORDER BY done) <= done - interval '2 min'
) AS step
FROM review
WHERE clicker_id = 71 AND "done" > '2015-05-13' AND "done" < '2015-05-13 15:00:00' -- keep the queries small and fast for now
) sub
ORDER BY done
) grouped
GROUP BY island
) sessions
) summary