尝试从一组 30 分钟的时间间隔中获取以小时为单位的开始和结束时间,并且某些结果未正确返回

Trying to get start and end time in hours from a set of 30 minute time intervals and some results aren't returning correctly

我正在尝试获取 Postgresql 数据库的工作小时数报告。在输出到报告之前,我将使用 Python 和 Pandas 来格式化 运行 额外的计算,并且我正在使用 pd.read_sqq_query() 方法将数据拉入 python 使用原始 SQL.

信息超过tableusers, intervals, claimedClaimed 是到 intervalsusers 的多对多映射。我希望让多个用户回来,所以我使用 PARTITION BY username 子句对他们进行分组。请让我知道布局是否可能导致问题,因为我下面的示例已经简化了一些。

我最近发现了各种讨论差距和孤岛问题的资源,并找到了一个似乎适合我已经适应工作的用例; 参考:Gaps and islands。它似乎是 MSSQL,但我不相信那里提到它。

问题是某些结果没有返回我期望的结果。我创建了一个 SQL Fiddle 最小可行 sqlfiddle

这是岛屿发现的部分之一。我正在使用 MAX(endtime) 和 MIN(starttime) 但在某些情况下我错过了最后的时间间隔。

Ex:下面的 table 有一个片段,我希望它将开始时间显示为 2020-03-08T0:00:00,将结束时间显示为 2020-03-08T4:00:00,但我实际上将结束时间显示为 2020-03-08T3:30:00

╔═════════════╦═════════════════════╦═════════════════════╗
║  Username   ║     Start Time      ║      End Time       ║
╠═════════════╬═════════════════════╬═════════════════════╣
║ Test User 1 ║ 2020-03-08T02:00:00 ║ 2020-03-08T02:30:00 ║
║ Test User 1 ║ 2020-03-08T02:30:00 ║ 2020-03-08T03:00:00 ║
║ Test User 1 ║ 2020-03-08T03:00:00 ║ 2020-03-08T03:30:00 ║
║ Test User 1 ║ 2020-03-08T03:30:00 ║ 2020-03-08T04:00:00 ║
╚═════════════╩═════════════════════╩═════════════════════╝

这就是我在 SQLFiddle 中的示例,还有更多数据,但都是针对一个用户的。

SELECT username,
       islandId,
       MIN(starttime) as IslandStartDate,
       MAX(endtime) as IslandEndDate
FROM
        (SELECT *,
                CASE
                    WHEN Groups.PreviousEndDate >= starttime THEN 0
                    ELSE 1
                END as IslandStartInd,
                SUM(CASE
                        WHEN Groups.PreviousEndDate >= starttime then 0
                        else 1
                    end) OVER (PARTITION BY Groups.username
                               ORDER BY Groups.RN) as IslandId
         FROM
                 ( SELECT ROW_NUMBER() over (PARTITION BY tr.username
                                             order by tr.starttime,
                                                      tr.endtime) as rn ,
                                            tr.username ,
                                            tr.starttime ,
                                            tr.endtime ,
                                            LAG(tr.endtime, 1) OVER (PARTITION BY tr.username
                                                                     ORDER BY tr.starttime,
                                                                              tr.endtime) as PreviousEndDate
                  FROM timerange tr
                  WHERE tr.starttime BETWEEN '2020-03-01' AND '2020-03-20'
                  ORDER BY tr.username) Groups ) Islands
Group BY username,
         islandid
ORDER BY username,
         IslandStartDate

我使用 window 函数和常见的 table 表达式重构了间隙和孤岛方法,使其更易于理解。

您可以取消注释底部的注释查询(一次一个),以逐步了解该策略的工作原理。

The sqlfiddle.

with gaps as (
  select *,
         case 
           when starttime = lag(endtime) over (partition by username 
                                                     order by starttime) then 0
           else 1
         end as gap_begin_row_marker
    from timerange
), grp_numbers as (
  select username, starttime, endtime,
         sum(gap_begin_row_marker) over (partition by username
                                             order by starttime) as grp_num
    from gaps
), collapsed_intervals as(
  select grp_num, username, min(starttime) as starttime, max(endtime) as endtime
    from grp_numbers
   group by grp_num, username
), summed_time as (
  select username, sum(endtime - starttime) as time_claimed
    from collapsed_intervals
   group by username
)
/* select * from gaps; */
/* select * from grp_numbers; */
/* select * from collapsed_intervals; */
select * from summed_time;