计数最大值每天并发用户会话数

Question

情况

我们有一个 PostgreSQL8.4 数据库，其中包含每行登录 date/time 和注销 date/time 的用户会话。我们的 web 应用程序会记录这次并处理用户未明确注销（会话超时）的情况。因此在每种情况下都会给出登录 date/time 和注销 date/time。

目标

我需要一天最大并发会话数的用户统计信息。所以，我可以这样说："At 2015-03-16 the peak of concurrent users logged in was six."

类似问题

这里已经回答了类似的问题：SQL max concurrent sessions per hour of day 但是，我无法根据我的情况调整解决方案，我希望得到一个结果 table，它显示最大值。每天而不是每小时的并发用户会话数。 table 方案也略有不同，因为在我的例子中，一行包含登录和注销 date/time，而在示例中，每一行代表登录或注销。另外，题目是基于MSSQL数据库环境，而不是PostgreSQL.

注意事项

不同用户的会话可以重叠
用户可能有重复会话，应该只计算一次（按用户名分组）
会话table的table方案如下所示

Table 方案：

user_id     |  login_date  |  login_time  |  logout_date  |  logout_time
------------+--------------+--------------+---------------+-------------
USER32      |  2014-03-03  |    08:23:00  |   2014-03-03  |     14:44:00
USER82      |  2014-03-03  |    08:49:00  |   2014-03-03  |     17:18:00
USER83      |  2014-03-03  |    09:40:00  |   2014-03-03  |     17:31:00
USER36      |  2014-03-03  |    09:50:00  |   2014-03-03  |     16:10:00
USER37      |  2014-03-03  |    11:44:00  |   2014-03-03  |     15:21:00
USER72      |  2014-03-03  |    12:52:00  |   2014-03-03  |     12:55:00

例子

以下示例通过 Google 图表 API 显示为时间线应该有助于理解问题：http://i.imgur.com/ZOjnLll.png

以 2015 年 3 月 3 日这一天为例，除 USER78（6 个用户）外的所有用户都在当天 12:52 和 12:55 之间登录。这是同时登录用户的最大数量，我需要在给定时间范围内每天这样的统计数据。

Day         |   MaxNumberOfConcurrentSessions 
------------+--------------------------------
2015-03-01  |                 2 
2015-03-02  |                 3
2015-03-03  |                 6
...

以上时间线截图示例 Google 图表 API。

google.setOnLoadCallback(drawChart);
function drawChart() {

  var container = document.getElementById('example5.1');
  var chart = new google.visualization.Timeline(container);
  var dataTable = new google.visualization.DataTable();
  dataTable.addColumn({ type: 'string', id: 'Room' });
  dataTable.addColumn({ type: 'string', id: 'Name' });
  dataTable.addColumn({ type: 'date', id: 'Start' });
  dataTable.addColumn({ type: 'date', id: 'End' });
  dataTable.addRows([
["USER78", '', new Date(2014,03,03,20,38), new Date(2014,03,03,21,14)],
["USER83", '', new Date(2014,03,03,09,40), new Date(2014,03,03,17,31)],
["USER72", '', new Date(2014,03,03,08,43), new Date(2014,03,03,08,43)],
["USER72", '', new Date(2014,03,03,09,40), new Date(2014,03,03,09,40)],
["USER72", '', new Date(2014,03,03,10,03), new Date(2014,03,03,10,06)],
["USER72", '', new Date(2014,03,03,12,52), new Date(2014,03,03,12,55)],
["USER72", '', new Date(2014,03,03,21,13), new Date(2014,03,03,21,13)],
["USER72", '', new Date(2014,03,03,21,37), new Date(2014,03,03,21,38)],
["USER72", '', new Date(2014,03,03,23,14), new Date(2014,03,03,23,15)],
["USER72", '', new Date(2014,03,03,23,27), new Date(2014,03,03,23,28)],
["USER36", '', new Date(2014,03,03,08,05), new Date(2014,03,03,09,17)],
["USER36", '', new Date(2014,03,03,09,50), new Date(2014,03,03,16,10)],
["USER36", '', new Date(2014,03,03,16,12), new Date(2014,03,03,20,29)],
["USER32", '', new Date(2014,03,03,08,23), new Date(2014,03,03,14,44)],
["USER82", '', new Date(2014,03,03,08,49), new Date(2014,03,03,17,18)],
["USER37", '', new Date(2014,03,03,08,04), new Date(2014,03,03,08,06)],
["USER37", '', new Date(2014,03,03,11,44), new Date(2014,03,03,15,21)],
["USER37", '', new Date(2014,03,03,15,34), new Date(2014,03,03,15,51)],
["USER37", '', new Date(2014,03,03,16,12), new Date(2014,03,03,16,14)],
["USER37", '', new Date(2014,03,03,16,52), new Date(2014,03,03,16,54)],
["USER37", '', new Date(2014,03,03,17,07), new Date(2014,03,03,17,08)],
["USER37", '', new Date(2014,03,03,20,20), new Date(2014,03,03,20,24)],
["USER37", '', new Date(2014,03,03,21,03), new Date(2014,03,03,21,20)],
["USER37", '', new Date(2014,03,03,22,42), new Date(2014,03,03,23,05)],
["USER37", '', new Date(2014,03,03,23,51), new Date(2014,03,03,23,56)],
["USER01", '', new Date(2014,03,03,16,11), new Date(2014,03,03,16,12)]
]);

  var options = {
    timeline: { colorByRowLabel: true }
  };

  chart.draw(dataTable, options);
}

<script type="text/javascript" src="https://www.google.com/jsapi?autoload={'modules':[{'name':'visualization',
       'version':'1','packages':['timeline']}]}"></script>
<div id="example5.1" style="width:5000px;height: 600px;"></div>

Answer 1

我会用 UNION ALL 序列化登录和注销，"in" 算作 1，"out" 算作 -1。然后使用简单的 window 函数计算运行计数，并获得每天的最大值。

由于没有指定，假设：

"Concurrent" 表示在同一时间点（不仅仅是在同一天）。
会话可以跨越任何时间范围（即多天）。
每个用户在一个时间点只能在线一次。所以在我的解决方案中不需要每个用户分组。
注销优先于登录。如果两者同时发生，则首先计算注销（导致边角情况下的并发数较低）。

WITH range AS (SELECT '2014-03-01'::date AS start_date  -- time range
                    , '2014-03-31'::date AS end_date)   -- inclusive bounds
, cte AS (
   SELECT *
   FROM   tbl, range r
   WHERE  login_date  <= r.end_date
   AND    logout_date >= r.start_date
   )
, ct AS (
   SELECT log_date, sum(ct) OVER (ORDER BY log_date, log_time, ct) AS session_ct
   FROM  (
      SELECT logout_date AS log_date, logout_time AS log_time, -1 AS ct FROM cte
      UNION ALL
      SELECT login_date, login_time, 1 FROM cte
      ) sub
   )
SELECT log_date, max(session_ct) AS max_sessions
FROM   ct, range r
WHERE  log_date BETWEEN r.start_date AND r.end_date  -- crop actual time range
GROUP  BY 1
ORDER  BY 1;

您可以在 cte:

中使用 OVERLAPS 运算符

AND   (login_date, logout_date) OVERLAPS (r.start_date, r.end_date)

详情：

Find overlapping date ranges in PostgreSQL

但这可能不是一个好主意，因为 (per documentation):

Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

大胆强调我的。您的范围上限必须是您想要的时间范围之后的那一天。

解释一下

CTE are available since Postgres 8.4.
第一个CTE range只是为了方便提供时间范围一次.
第二个 CTE cte 仅选择相关行：那些...
- 在范围
- 并在范围内或之后结束
第 3 个 CTE ct 将 "in" 和 "out" 点序列化为 +/-1 并使用聚合计算运行计数函数 sum() 用作 window 函数。这些是 available since Postgres 8.4.
在最后 SELECT trim 前导和尾随天数中累计每天的最大值。瞧。

SQL Fiddle 对于 Postgres 9.6。
Postgres 8.4 太旧，不再可用，但应该可以正常工作。我在测试用例中添加了一行 - 一个跨越多天。应该让它更有用。

备注

我通常会使用 timestamp 而不是 date 和 time。同样的尺寸，更容易处理。或者 timestamptz 如果可以涉及多个时区。

(login_date, logout_date DESC) 上的索引至少对性能有帮助。

Answer 2

到目前为止我的想法：

首先，找到会话之间所有可能的重叠（即。"inner join" 和重叠条件“(s1.login_time, s1.logout_time) OVERLAPS (s2.login_time, s2.logout_time)")
根据最小公共时间跨度找到最大并发会话数（参见 where 子句的最后一部分 "s1.login_time >= s2.login_time AND s1.logout_time <= s2.logout_time"）

SQL 语句如下所示：

SELECT report_date, MAX(concurrent_sessions) AS max_concurrent_sessions FROM(
  SELECT report_date, session_id, count(session_id) as concurrent_sessions from (
    SELECT s1.id AS session_id, s1.user_id, s1.login_date AS report_date, s1.login_time, s1.logout_date, s1.logout_time, s2.id, s2.user_id, s2.    login_date, s2.login_time, s2.logout_date, s2.logout_time 
    FROM sessions s1
    INNER JOIN sessions s2 ON s1.login_date = s2.login_date
    WHERE s1.login_date between '2014-03-01' AND '2014-03-31' AND (s1.login_time, s1.logout_time) OVERLAPS (s2.login_time, s2.logout_time) AND s1.    login_time >= s2.login_time AND s1.logout_time <= s2.logout_time
    ORDER BY s1.id
  ) AS concurrent_overlapping_sessions 
  GROUP BY report_date, session_id 
) AS max_concurrent_overlapping_sessions
GROUP BY report_date
ORDER BY report_date

与其他建议的解决方案相比，您如何看待这个解决方案（例如性能、正确性等）？

计数最大值每天并发用户会话数

Count max. number of concurrent user sessions per day

sql

postgresql

aggregate-functions

overlap

window-functions

解释一下

备注