计算从开始和结束时间跨度得出的每分钟会话数
Count sessions per minute derived from start and end timespans
我有一个 table,其中包含用户 activity 的记录,涵盖由开始和结束时间指示的跨度。我正在寻找前一天每单位时间在系统中活跃的用户数。
最大会话长度为一个小时,并且它们不会跨越小时界限。会话可以在同一分钟内结束并开始新的会话。
这是查询的简化版本:
with minutes AS (
-- ignore this...it generates a day's worth of timestamps for each minute
-- it's hairy but is what I'm stuck with on redshift
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from seed_table limit 1440
),
sessions as (
select sid, ts_start, ts_end
from user_sessions s
where ts_end >= sysdate::date-'1 day'::interval
and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1
我正在努力避免那种讨厌的左连接:
-> XN Nested Loop Left Join DS_BCAST_INNER (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))
根据 Gordon Linoff 的回答,以下是几乎对我有用的方法。当用户的会话在一分钟内相互转换时,它会被低估。似乎是正确的方向。出于同样的原因,原始查询可能会过度计数,但在一分钟内获取不同会话 ID 计数的机会解决了这个问题。
select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
select minute, sum(count) as count
from (
(
select date_trunc('minute', ts_start) as minute, count(*) as count
from sessions
group by 1
) union all (
select date_trunc('minute', ts_end) as minute, - count(*) as count
from sessions
group by 1
)
) s1
group by minute
) s2
order by minute;
为了比较,这里是一个小时的数据的计时结果:
- 原始查询时间:81301.345 毫秒
- 总计查询时间:36242.342 毫秒
您可以通过计算每分钟的启动和停止次数,然后求出总和来更快地完成此操作。结果是这样的:
select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
from sessions
group by 1
) union all
(select date_trunc('minute', ts_end), - count(*)
from sessions
group by 1
)
) s
group by minute
order by minute;
我有一个 table,其中包含用户 activity 的记录,涵盖由开始和结束时间指示的跨度。我正在寻找前一天每单位时间在系统中活跃的用户数。
最大会话长度为一个小时,并且它们不会跨越小时界限。会话可以在同一分钟内结束并开始新的会话。
这是查询的简化版本:
with minutes AS (
-- ignore this...it generates a day's worth of timestamps for each minute
-- it's hairy but is what I'm stuck with on redshift
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from seed_table limit 1440
),
sessions as (
select sid, ts_start, ts_end
from user_sessions s
where ts_end >= sysdate::date-'1 day'::interval
and ts_start < sysdate::date
)
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1
我正在努力避免那种讨厌的左连接:
-> XN Nested Loop Left Join DS_BCAST_INNER (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))
根据 Gordon Linoff 的回答,以下是几乎对我有用的方法。当用户的会话在一分钟内相互转换时,它会被低估。似乎是正确的方向。出于同样的原因,原始查询可能会过度计数,但在一分钟内获取不同会话 ID 计数的机会解决了这个问题。
select minute, sum(count) over (order by minute rows unbounded preceding) as users
from (
select minute, sum(count) as count
from (
(
select date_trunc('minute', ts_start) as minute, count(*) as count
from sessions
group by 1
) union all (
select date_trunc('minute', ts_end) as minute, - count(*) as count
from sessions
group by 1
)
) s1
group by minute
) s2
order by minute;
为了比较,这里是一个小时的数据的计时结果:
- 原始查询时间:81301.345 毫秒
- 总计查询时间:36242.342 毫秒
您可以通过计算每分钟的启动和停止次数,然后求出总和来更快地完成此操作。结果是这样的:
select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
from sessions
group by 1
) union all
(select date_trunc('minute', ts_end), - count(*)
from sessions
group by 1
)
) s
group by minute
order by minute;