删除会话记录中的时间重叠
Remove Time Overlaps in Sessions Records
我有一个 Sessions table 列(User_ID、Sessions_ID、LogOn、LogOut),用户可以同时打开多个会话,我的目标是计算每个用户在我的系统上花费的纯时间。我使用了以下查询:
SELECT
T1.User_ID,
SUM(T1.Duration) AS Duration
FROM (
SELECT
T2.User_ID,
T2.Logon,
(CASE WHEN T3.LogOn IS NULL OR T3.LogOn > T2.LogOut THEN T2.LogOut ELSE T3.LogOn END) AS LogOutEdited,
DATEDIFF(MINUTE, T2.Logon, (CASE WHEN T3.LogOn IS NULL OR T3.LogOn > T2.LogOut THEN T2.LogOut ELSE T3.LogOn END)) AS Duration
FROM (
SELECT
(DENSE_RANK() OVER (PARTITION BY User_ID ORDER BY LogOn)) AS Serial,
User_ID, LogOn, LogOut
FROM Sessions
) AS T2
LEFT JOIN (
SELECT
(DENSE_RANK() OVER (PARTITION BY User_ID ORDER BY LogOn)) AS Serial,
User_ID, LogOn, LogOut
FROM Sessions
) AS T3
ON T2.User_ID = T3.User_ID
AND T2.Serial = T3.Serial - 1
) AS T1
GROUP BY T1.User_ID
此查询将一个会话的结束与下一个会话的开始进行比较,并调整第一个会话的结束以消除重叠时间。它确实给出了正确的结果(我认为 :) )但是它的性能不被赞赏,我可以在这里应用更有效的逻辑吗?
编辑:
示例数据:
--------------------------------------------------------------------
| User_ID | Session_ID | LogOn | LogOut |
--------------------------------------------------------------------
| 1 | 100 | 2020-01-01 01:00:00 | 2020-01-01 01:30:00 |
--------------------------------------------------------------------
| 1 | 101 | 2020-01-01 01:15:00 | 2020-01-01 01:45:00 |
--------------------------------------------------------------------
| 1 | 102 | 2020-01-01 01:35:00 | 2020-01-01 01:40:00 |
--------------------------------------------------------------------
| 2 | 103 | 2020-01-01 03:13:00 | 2020-01-01 03:23:00 |
--------------------------------------------------------------------
| 1 | 104 | 2020-01-01 04:00:00 | 2020-01-01 04:15:00 |
--------------------------------------------------------------------
期望的结果:
----------------------
| User_ID | Duration |
----------------------
| 1 | 60 |
----------------------
| 2 | 10 |
----------------------
不想要的结果:
----------------------
| User_ID | Duration |
----------------------
| 1 | 80 |
----------------------
| 2 | 10 |
----------------------
这是一个间隙和孤岛问题,您要在其中尝试识别孤岛,并对每个用户的总持续时间求和。
这是一种使用 lag()
和 window sum()
来定义组的方法。以下查询为每组重叠会话提供一行:
select user_id, min(log_in) log_in, max(log_out) log_out
from (
select
t.*,
sum(case when log_in <= lag_log_out then 0 else 1 end)
over(partition by user_id order by log_in) as grp
from (
select
t.*,
lag(log_out) over(partition by user_id order by log_in) as lag_log_out
from mytable t
) t
) t
group by user_id, grp
您可以添加一级聚合来计算每个用户花费的总时间:
select user_id, sum(datediff(minute, login, log_out)) duration
from (
select user_id, min(log_in) log_in, max(log_out) log_out
from (
select
t.*,
sum(case when log_in <= lag_log_out then 0 else 1 end)
over(partition by user_id order by log_in) as grp
from (
select
t.*,
lag(log_out) over(partition by user_id order by log_in) as lag_log_out
from mytable t
) t
) t
group by user_id, grp
) t
group by user_id
我有一个 Sessions table 列(User_ID、Sessions_ID、LogOn、LogOut),用户可以同时打开多个会话,我的目标是计算每个用户在我的系统上花费的纯时间。我使用了以下查询:
SELECT
T1.User_ID,
SUM(T1.Duration) AS Duration
FROM (
SELECT
T2.User_ID,
T2.Logon,
(CASE WHEN T3.LogOn IS NULL OR T3.LogOn > T2.LogOut THEN T2.LogOut ELSE T3.LogOn END) AS LogOutEdited,
DATEDIFF(MINUTE, T2.Logon, (CASE WHEN T3.LogOn IS NULL OR T3.LogOn > T2.LogOut THEN T2.LogOut ELSE T3.LogOn END)) AS Duration
FROM (
SELECT
(DENSE_RANK() OVER (PARTITION BY User_ID ORDER BY LogOn)) AS Serial,
User_ID, LogOn, LogOut
FROM Sessions
) AS T2
LEFT JOIN (
SELECT
(DENSE_RANK() OVER (PARTITION BY User_ID ORDER BY LogOn)) AS Serial,
User_ID, LogOn, LogOut
FROM Sessions
) AS T3
ON T2.User_ID = T3.User_ID
AND T2.Serial = T3.Serial - 1
) AS T1
GROUP BY T1.User_ID
此查询将一个会话的结束与下一个会话的开始进行比较,并调整第一个会话的结束以消除重叠时间。它确实给出了正确的结果(我认为 :) )但是它的性能不被赞赏,我可以在这里应用更有效的逻辑吗?
编辑:
示例数据:
--------------------------------------------------------------------
| User_ID | Session_ID | LogOn | LogOut |
--------------------------------------------------------------------
| 1 | 100 | 2020-01-01 01:00:00 | 2020-01-01 01:30:00 |
--------------------------------------------------------------------
| 1 | 101 | 2020-01-01 01:15:00 | 2020-01-01 01:45:00 |
--------------------------------------------------------------------
| 1 | 102 | 2020-01-01 01:35:00 | 2020-01-01 01:40:00 |
--------------------------------------------------------------------
| 2 | 103 | 2020-01-01 03:13:00 | 2020-01-01 03:23:00 |
--------------------------------------------------------------------
| 1 | 104 | 2020-01-01 04:00:00 | 2020-01-01 04:15:00 |
--------------------------------------------------------------------
期望的结果:
----------------------
| User_ID | Duration |
----------------------
| 1 | 60 |
----------------------
| 2 | 10 |
----------------------
不想要的结果:
----------------------
| User_ID | Duration |
----------------------
| 1 | 80 |
----------------------
| 2 | 10 |
----------------------
这是一个间隙和孤岛问题,您要在其中尝试识别孤岛,并对每个用户的总持续时间求和。
这是一种使用 lag()
和 window sum()
来定义组的方法。以下查询为每组重叠会话提供一行:
select user_id, min(log_in) log_in, max(log_out) log_out
from (
select
t.*,
sum(case when log_in <= lag_log_out then 0 else 1 end)
over(partition by user_id order by log_in) as grp
from (
select
t.*,
lag(log_out) over(partition by user_id order by log_in) as lag_log_out
from mytable t
) t
) t
group by user_id, grp
您可以添加一级聚合来计算每个用户花费的总时间:
select user_id, sum(datediff(minute, login, log_out)) duration
from (
select user_id, min(log_in) log_in, max(log_out) log_out
from (
select
t.*,
sum(case when log_in <= lag_log_out then 0 else 1 end)
over(partition by user_id order by log_in) as grp
from (
select
t.*,
lag(log_out) over(partition by user_id order by log_in) as lag_log_out
from mytable t
) t
) t
group by user_id, grp
) t
group by user_id