PostgreSQL 计算每小时最大并发用户会话数
PostgreSQL count max number of concurrent user sessions per hour
情况
我们有一个 PostgreSQL 9.1 数据库,其中包含每行登录 date/time 和注销 date/time 的用户会话。 Table 看起来像这样:
user_id | login_ts | logout_ts
------------+--------------+--------------------------------
USER1 | 2021-02-03 09:23:00 | 2021-02-03 11:44:00
USER2 | 2021-02-03 10:49:00 | 2021-02-03 13:30:00
USER3 | 2021-02-03 13:32:00 | 2021-02-03 15:31:00
USER4 | 2021-02-04 13:50:00 | 2021-02-04 14:53:00
USER5 | 2021-02-04 14:44:00 | 2021-02-04 15:21:00
USER6 | 2021-02-04 14:52:00 | 2021-02-04 17:59:00
目标
想获取时间范围内每天每24小时的最大并发用户数。像这样:
date | hour | sessions
-----------+-------+-----------
2021-02-03 | 01:00 | 0
2021-02-03 | 02:00 | 0
2021-02-03 | 03:00 | 0
2021-02-03 | 04:00 | 0
2021-02-03 | 05:00 | 0
2021-02-03 | 06:00 | 0
2021-02-03 | 07:00 | 0
2021-02-03 | 08:00 | 0
2021-02-03 | 09:00 | 1
2021-02-03 | 10:00 | 2
2021-02-03 | 11:00 | 2
2021-02-03 | 12:00 | 1
2021-02-03 | 13:00 | 1
2021-02-03 | 14:00 | 1
2021-02-03 | 15:00 | 0
2021-02-03 | 16:00 | 0
2021-02-03 | 17:00 | 0
2021-02-03 | 18:00 | 0
2021-02-03 | 19:00 | 0
2021-02-03 | 20:00 | 0
2021-02-03 | 21:00 | 0
2021-02-03 | 22:00 | 0
2021-02-03 | 23:00 | 0
2021-02-03 | 24:00 | 0
2021-02-04 | 01:00 | 0
2021-02-04 | 02:00 | 0
2021-02-04 | 03:00 | 0
2021-02-04 | 04:00 | 0
2021-02-04 | 05:00 | 0
2021-02-04 | 06:00 | 0
2021-02-04 | 07:00 | 0
2021-02-04 | 08:00 | 0
2021-02-04 | 09:00 | 0
2021-02-04 | 10:00 | 0
2021-02-04 | 11:00 | 0
2021-02-04 | 12:00 | 0
2021-02-04 | 13:00 | 1
2021-02-04 | 14:00 | 3
2021-02-04 | 15:00 | 1
2021-02-04 | 16:00 | 1
2021-02-04 | 17:00 | 1
2021-02-04 | 18:00 | 0
2021-02-04 | 19:00 | 0
2021-02-04 | 20:00 | 0
2021-02-04 | 21:00 | 0
2021-02-04 | 22:00 | 0
2021-02-04 | 23:00 | 0
2021-02-04 | 24:00 | 0
注意事项
- “并发”是指在同一时间点。因此 user2 和 user3 不重叠
13:00,但是用户 4 和用户 6 重叠了 14:00,即使他们只重叠了 1 分钟。
- 用户会话可以跨越多个小时,因此将按他们参与的每个小时计算。
- 每个用户在一个时间点只能在线一次。
- 如果特定时间没有用户,这应该是 return 0。
类似问题
这里回答了一个类似的问题: by Erwin Brandstetter。但是,这是每天而不是每小时,而且我显然是 postgreSQL 的新手,无法将其转换为每小时,所以我希望有人能提供帮助。
对于任何时间段,您都可以使用 SQL 中的 OVERLAPS
运算符计算并发会话数:
CREATE TEMP TABLE sessions (
user_id text not null,
login_ts timestamp,
logout_ts timestamp );
INSERT INTO sessions SELECT 'webuser', d,
d+((1+random()*300)::text||' seconds')::interval
FROM generate_series(
'2021-02-28 07:42'::timestamp,
'2021-03-01 07:42'::timestamp,
'5 seconds'::interval) AS d;
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2
WHERE (s2.login_ts, s2.logout_ts) OVERLAPS (s1.login_ts, s1.logout_ts))
AS parallel_sessions
FROM sessions s1 LIMIT 10;
user_id | login_ts | logout_ts | parallel_sessions
---------+---------------------+----------------------------+------------------
webuser | 2021-02-28 07:42:00 | 2021-02-28 07:42:25.528594 | 6
webuser | 2021-02-28 07:42:05 | 2021-02-28 07:45:50.513769 | 47
webuser | 2021-02-28 07:42:10 | 2021-02-28 07:44:18.810066 | 28
webuser | 2021-02-28 07:42:15 | 2021-02-28 07:45:17.3888 | 40
webuser | 2021-02-28 07:42:20 | 2021-02-28 07:43:14.325476 | 15
webuser | 2021-02-28 07:42:25 | 2021-02-28 07:43:44.174841 | 21
webuser | 2021-02-28 07:42:30 | 2021-02-28 07:43:32.679052 | 18
webuser | 2021-02-28 07:42:35 | 2021-02-28 07:45:12.554117 | 38
webuser | 2021-02-28 07:42:40 | 2021-02-28 07:46:37.94311 | 55
webuser | 2021-02-28 07:42:45 | 2021-02-28 07:43:08.398444 | 13
(10 rows)
这适用于小型数据集,但为了获得更好的性能,请使用 PostgreSQL Range Types,如下所示。这适用于 postgres 9.2 及更高版本。
ALTER TABLE sessions ADD timerange tsrange;
UPDATE sessions SET timerange = tsrange(login_ts,logout_ts);
CREATE INDEX ON sessions USING gist (timerange);
CREATE TEMP TABLE level1 AS
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2
WHERE s2.timerange && s1.timerange) AS parallel_sessions
FROM sessions s1;
SELECT date_trunc('hour',login_ts) AS hour, count(*),
max(parallel_sessions)
FROM level1
GROUP BY hour;
hour | count | max
---------------------+-------+-----
2021-02-28 14:00:00 | 720 | 98
2021-03-01 03:00:00 | 720 | 99
2021-03-01 06:00:00 | 720 | 94
2021-02-28 09:00:00 | 720 | 96
2021-02-28 10:00:00 | 720 | 97
2021-02-28 18:00:00 | 720 | 94
2021-02-28 11:00:00 | 720 | 97
2021-03-01 00:00:00 | 720 | 97
2021-02-28 19:00:00 | 720 | 99
2021-02-28 16:00:00 | 720 | 94
2021-02-28 17:00:00 | 720 | 95
2021-03-01 02:00:00 | 720 | 99
2021-02-28 08:00:00 | 720 | 96
2021-02-28 23:00:00 | 720 | 94
2021-03-01 07:00:00 | 505 | 92
2021-03-01 04:00:00 | 720 | 95
2021-02-28 21:00:00 | 720 | 97
2021-03-01 01:00:00 | 720 | 93
2021-02-28 22:00:00 | 720 | 96
2021-03-01 05:00:00 | 720 | 93
2021-02-28 20:00:00 | 720 | 95
2021-02-28 13:00:00 | 720 | 95
2021-02-28 12:00:00 | 720 | 97
2021-02-28 15:00:00 | 720 | 98
2021-02-28 07:00:00 | 216 | 93
(25 rows)
我会将其分解为两个问题:
- 找出重叠的数量以及它们开始和结束的时间。
- 查找时间。
注意两点:
- 我假设
'2014-04-03 17:59:00'
是一个错字。
- 以下内容在一小时开始时将 date/hour 放在一个列中。
首先,计算重叠。为此,取消登录和注销。为登录输入 +1
的计数器,为注销输入 -1
的计数器并进行累计。这看起来像:
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select *
from overlap
order by ts;
对于下一步,使用 generate_series()
生成相隔一小时的时间戳。使用 left join
和 group by
:
查找该期间的最大值
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select gs.hh, coalesce(max(o.num_overlaps), 0) as num_overlaps
from generate_series('2021-02-03'::date, '2021-02-05'::date, interval '1 hour') gs(hh) left join
overlap o
on o.ts < gs.hh + interval '1 hour' and
o.next_ts > gs.hh
group by gs.hh
order by gs.hh;
Here 是一个 db<>fiddle 使用您的数据固定的最后一条记录的合理注销时间。
情况
我们有一个 PostgreSQL 9.1 数据库,其中包含每行登录 date/time 和注销 date/time 的用户会话。 Table 看起来像这样:
user_id | login_ts | logout_ts
------------+--------------+--------------------------------
USER1 | 2021-02-03 09:23:00 | 2021-02-03 11:44:00
USER2 | 2021-02-03 10:49:00 | 2021-02-03 13:30:00
USER3 | 2021-02-03 13:32:00 | 2021-02-03 15:31:00
USER4 | 2021-02-04 13:50:00 | 2021-02-04 14:53:00
USER5 | 2021-02-04 14:44:00 | 2021-02-04 15:21:00
USER6 | 2021-02-04 14:52:00 | 2021-02-04 17:59:00
目标
想获取时间范围内每天每24小时的最大并发用户数。像这样:
date | hour | sessions
-----------+-------+-----------
2021-02-03 | 01:00 | 0
2021-02-03 | 02:00 | 0
2021-02-03 | 03:00 | 0
2021-02-03 | 04:00 | 0
2021-02-03 | 05:00 | 0
2021-02-03 | 06:00 | 0
2021-02-03 | 07:00 | 0
2021-02-03 | 08:00 | 0
2021-02-03 | 09:00 | 1
2021-02-03 | 10:00 | 2
2021-02-03 | 11:00 | 2
2021-02-03 | 12:00 | 1
2021-02-03 | 13:00 | 1
2021-02-03 | 14:00 | 1
2021-02-03 | 15:00 | 0
2021-02-03 | 16:00 | 0
2021-02-03 | 17:00 | 0
2021-02-03 | 18:00 | 0
2021-02-03 | 19:00 | 0
2021-02-03 | 20:00 | 0
2021-02-03 | 21:00 | 0
2021-02-03 | 22:00 | 0
2021-02-03 | 23:00 | 0
2021-02-03 | 24:00 | 0
2021-02-04 | 01:00 | 0
2021-02-04 | 02:00 | 0
2021-02-04 | 03:00 | 0
2021-02-04 | 04:00 | 0
2021-02-04 | 05:00 | 0
2021-02-04 | 06:00 | 0
2021-02-04 | 07:00 | 0
2021-02-04 | 08:00 | 0
2021-02-04 | 09:00 | 0
2021-02-04 | 10:00 | 0
2021-02-04 | 11:00 | 0
2021-02-04 | 12:00 | 0
2021-02-04 | 13:00 | 1
2021-02-04 | 14:00 | 3
2021-02-04 | 15:00 | 1
2021-02-04 | 16:00 | 1
2021-02-04 | 17:00 | 1
2021-02-04 | 18:00 | 0
2021-02-04 | 19:00 | 0
2021-02-04 | 20:00 | 0
2021-02-04 | 21:00 | 0
2021-02-04 | 22:00 | 0
2021-02-04 | 23:00 | 0
2021-02-04 | 24:00 | 0
注意事项
- “并发”是指在同一时间点。因此 user2 和 user3 不重叠 13:00,但是用户 4 和用户 6 重叠了 14:00,即使他们只重叠了 1 分钟。
- 用户会话可以跨越多个小时,因此将按他们参与的每个小时计算。
- 每个用户在一个时间点只能在线一次。
- 如果特定时间没有用户,这应该是 return 0。
类似问题
这里回答了一个类似的问题:
对于任何时间段,您都可以使用 SQL 中的 OVERLAPS
运算符计算并发会话数:
CREATE TEMP TABLE sessions (
user_id text not null,
login_ts timestamp,
logout_ts timestamp );
INSERT INTO sessions SELECT 'webuser', d,
d+((1+random()*300)::text||' seconds')::interval
FROM generate_series(
'2021-02-28 07:42'::timestamp,
'2021-03-01 07:42'::timestamp,
'5 seconds'::interval) AS d;
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2
WHERE (s2.login_ts, s2.logout_ts) OVERLAPS (s1.login_ts, s1.logout_ts))
AS parallel_sessions
FROM sessions s1 LIMIT 10;
user_id | login_ts | logout_ts | parallel_sessions
---------+---------------------+----------------------------+------------------
webuser | 2021-02-28 07:42:00 | 2021-02-28 07:42:25.528594 | 6
webuser | 2021-02-28 07:42:05 | 2021-02-28 07:45:50.513769 | 47
webuser | 2021-02-28 07:42:10 | 2021-02-28 07:44:18.810066 | 28
webuser | 2021-02-28 07:42:15 | 2021-02-28 07:45:17.3888 | 40
webuser | 2021-02-28 07:42:20 | 2021-02-28 07:43:14.325476 | 15
webuser | 2021-02-28 07:42:25 | 2021-02-28 07:43:44.174841 | 21
webuser | 2021-02-28 07:42:30 | 2021-02-28 07:43:32.679052 | 18
webuser | 2021-02-28 07:42:35 | 2021-02-28 07:45:12.554117 | 38
webuser | 2021-02-28 07:42:40 | 2021-02-28 07:46:37.94311 | 55
webuser | 2021-02-28 07:42:45 | 2021-02-28 07:43:08.398444 | 13
(10 rows)
这适用于小型数据集,但为了获得更好的性能,请使用 PostgreSQL Range Types,如下所示。这适用于 postgres 9.2 及更高版本。
ALTER TABLE sessions ADD timerange tsrange;
UPDATE sessions SET timerange = tsrange(login_ts,logout_ts);
CREATE INDEX ON sessions USING gist (timerange);
CREATE TEMP TABLE level1 AS
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2
WHERE s2.timerange && s1.timerange) AS parallel_sessions
FROM sessions s1;
SELECT date_trunc('hour',login_ts) AS hour, count(*),
max(parallel_sessions)
FROM level1
GROUP BY hour;
hour | count | max
---------------------+-------+-----
2021-02-28 14:00:00 | 720 | 98
2021-03-01 03:00:00 | 720 | 99
2021-03-01 06:00:00 | 720 | 94
2021-02-28 09:00:00 | 720 | 96
2021-02-28 10:00:00 | 720 | 97
2021-02-28 18:00:00 | 720 | 94
2021-02-28 11:00:00 | 720 | 97
2021-03-01 00:00:00 | 720 | 97
2021-02-28 19:00:00 | 720 | 99
2021-02-28 16:00:00 | 720 | 94
2021-02-28 17:00:00 | 720 | 95
2021-03-01 02:00:00 | 720 | 99
2021-02-28 08:00:00 | 720 | 96
2021-02-28 23:00:00 | 720 | 94
2021-03-01 07:00:00 | 505 | 92
2021-03-01 04:00:00 | 720 | 95
2021-02-28 21:00:00 | 720 | 97
2021-03-01 01:00:00 | 720 | 93
2021-02-28 22:00:00 | 720 | 96
2021-03-01 05:00:00 | 720 | 93
2021-02-28 20:00:00 | 720 | 95
2021-02-28 13:00:00 | 720 | 95
2021-02-28 12:00:00 | 720 | 97
2021-02-28 15:00:00 | 720 | 98
2021-02-28 07:00:00 | 216 | 93
(25 rows)
我会将其分解为两个问题:
- 找出重叠的数量以及它们开始和结束的时间。
- 查找时间。
注意两点:
- 我假设
'2014-04-03 17:59:00'
是一个错字。 - 以下内容在一小时开始时将 date/hour 放在一个列中。
首先,计算重叠。为此,取消登录和注销。为登录输入 +1
的计数器,为注销输入 -1
的计数器并进行累计。这看起来像:
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select *
from overlap
order by ts;
对于下一步,使用 generate_series()
生成相隔一小时的时间戳。使用 left join
和 group by
:
with overlap as (
select v.ts, sum(v.inc) as inc,
sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
lead(v.ts) over (order by v.ts) as next_ts
from sessions s cross join lateral
(values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
group by v.ts
)
select gs.hh, coalesce(max(o.num_overlaps), 0) as num_overlaps
from generate_series('2021-02-03'::date, '2021-02-05'::date, interval '1 hour') gs(hh) left join
overlap o
on o.ts < gs.hh + interval '1 hour' and
o.next_ts > gs.hh
group by gs.hh
order by gs.hh;
Here 是一个 db<>fiddle 使用您的数据固定的最后一条记录的合理注销时间。