SQL 计算间隔/重叠的天数

SQL counting days with gap / overlapping

我正在处理一个与这个几乎相同的 "counting days" 问题。我有一个日期列表,需要计算使用了多少天,不包括重复,并处理差距。相同的输入和输出。

发件人:马库斯·贾德罗

Input
ID   d1           d2
 1   2011-08-01   2011-08-08
 1   2011-08-02   2011-08-06
 1   2011-08-03   2011-08-10
 1   2011-08-12   2011-08-14
 2   2011-08-01   2011-08-03
 2   2011-08-02   2011-08-06
 2   2011-08-05   2011-08-09

Output
ID   hold_days
 1          11
 2           8

SQL to find time elapsed from multiple overlapping intervals

但我一直无法理解 Markus Jarderot 的解决方案。

SELECT DISTINCT
    t1.ID,
    t1.d1 AS date,
    -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2                   -- Join for any events occurring while this
    ON t2.ID = t1.ID                  -- is starting. If this is a start point,
    AND t2.d1 <> t1.d1                -- it won't match anything, which is what
    AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

为什么 DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) 从整个列表的 min(d1) 中挑选?是不是不管ID.

t1.d1 BETWEEN t2.d1 AND t2.d2 do 是什么意思?这是为了确保只计算重叠的间隔吗?

和group by一样,我想是因为万一相同的周期会被丢弃?我试图手动追踪解决方案,但越来越困惑。

暴力破解的方法是创建所有天数(在递归查询中)然后统计:

with dates(id, day, d2) as
(
  select id, d1 as day, d2 from mytable
  union all
  select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;

不幸的是,某些 Oracle 版本中存在一个错误,使用日期的递归查询在那里不起作用。所以试试这个代码,看看它是否适用于你的系统。 (我有 Oracle 11.2,但错误仍然存​​在;所以我猜你需要 Oracle 12c。)

如果您的所有间隔都从不同的日期开始,请按 d1 升序考虑它们,计算从 d1 到下一个间隔的天数。 您可以丢弃它包含在另一个间隔中的间隔。 最后一个区间不会有追随者。

这个查询应该告诉你每个间隔有多少天

select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1

然后按id分组,天数相加

我猜 Markus 的想法是找到所有不在其他范围内的起点和所有不在其他范围内的终点。然后只取第一个起点到第一个终点,然后是下一个起点到下一个终点,等等。由于 Markus 没有使用 window 函数对起点和终点进行编号,他必须找到一个更复杂的方法来实现这一点。这是带有 ROW_NUMBER 的查询。也许这可以让您开始在 Markus 的查询中寻找什么。

select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
  select id, d1 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d1 > m2.d1 and m1.d1 <= m2.d2
  )
) startpoint
join
(
  select id, d2 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d2 >= m2.d1 and m1.d2 < m2.d2
  )
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;

这主要是我的回答 的重复,但包含在 id 列上的分组。它应该使用单个 table 扫描并且不需要递归子查询分解子句 (CTE) 或自连接。

SQL Fiddle

Oracle 11g R2 架构设置:

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
  SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
  SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
  SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
  SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
  SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
  SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
  SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
  SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
  SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
  SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
  SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
  SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
  SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
  SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
  SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

查询 1:

SELECT id,
       SUM( days ) AS total_days
FROM   (
  SELECT id,
         dt - LAG( dt ) OVER ( PARTITION BY id
                               ORDER BY dt ) + 1 AS days,
         start_end
  FROM   (
    SELECT id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY id
                                    ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   your_table
    UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

Results:

| ID | TOTAL_DAYS |
|----|------------|
|  1 |         25 |
|  2 |         13 |
|  3 |          9 |