学校假期设置中的重叠间隙和岛屿
Overlapping gaps and islands in a school vacation setup
我必须使用这个 periods
table:
周期
id | starts_on | ends_on
----+------------+------------
678 | 2019-12-21 | 2019-12-22
534 | 2019-12-23 | 2020-01-04
679 | 2019-12-28 | 2019-12-29
9 | 2020-01-01 | 2020-01-01
776 | 2020-01-04 | 2020-01-05
7 | 2020-01-06 | 2020-01-06
777 | 2020-01-11 | 2020-01-12
它列出了学生不必上学的所有时间段。不幸的是,有些时期重叠。这种情况发生在学校假期、周末或 public 假期(每个假期都有自己的时间行)时。
在 and Gaps and islands for school vacations in a country with federal states 的帮助下,我得到了这个查询:
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
lag(ends_on) OVER (PARTITION BY 1 ORDER BY starts_on) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15' ) p
) p
) p;
我得到了什么
结果是
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 15 | {678,534,679}
534 | 2019-12-23 | 2020-01-04 | 0 | 15 | {678,534,679}
679 | 2019-12-28 | 2019-12-29 | 0 | 15 | {678,534,679}
9 | 2020-01-01 | 2020-01-01 | 1 | 1 | {9}
776 | 2020-01-04 | 2020-01-05 | 2 | 3 | {776,7}
7 | 2020-01-06 | 2020-01-06 | 2 | 3 | {776,7}
777 | 2020-01-11 | 2020-01-12 | 3 | 2 | {777}
前三行是 grp
0(ids 678、534 和 679)。
我想要的
但是 id 9、776 和 7 也应该属于那个 grp
。不幸的是它们重叠了。有没有可能得到这样的结果(我不关心顺序)?
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 17 | {678,534,679,9,776,7}
534 | 2019-12-23 | 2020-01-04 | 0 | 17 | {678,534,679,9,776,7}
679 | 2019-12-28 | 2019-12-29 | 0 | 17 | {678,534,679,9,776,7}
9 | 2020-01-01 | 2020-01-01 | 0 | 17 | {678,534,679,9,776,7}
776 | 2020-01-04 | 2020-01-05 | 0 | 17 | {678,534,679,9,776,7}
7 | 2020-01-06 | 2020-01-06 | 0 | 17 | {678,534,679,9,776,7}
777 | 2020-01-11 | 2020-01-12 | 1 | 2 | {777}
我想知道整个岛屿 (grp 0) 的天数及其包含的周期 ID。
这是您的其他问题的一个有趣变体。问题是 lag()
只查看前一行来检查重叠。相反,您想查看前面的所有行。
幸运的是,您可以为此目的使用累积 max()
:
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
MAX(ends_on) OVER (ORDER BY starts_on ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on
FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15'
) p
) p
) p;
我不确定 PARTITION BY 1
应该做什么,但我没有包含它。
Here 是一个 rextester。
预测您的下一个问题。这有一个挑战:如果开始时间相等,则累积最大值不稳定。在这种情况下,您要么想要删除重复项,要么使累积最大值的排序稳定。
我必须使用这个 periods
table:
周期
id | starts_on | ends_on
----+------------+------------
678 | 2019-12-21 | 2019-12-22
534 | 2019-12-23 | 2020-01-04
679 | 2019-12-28 | 2019-12-29
9 | 2020-01-01 | 2020-01-01
776 | 2020-01-04 | 2020-01-05
7 | 2020-01-06 | 2020-01-06
777 | 2020-01-11 | 2020-01-12
它列出了学生不必上学的所有时间段。不幸的是,有些时期重叠。这种情况发生在学校假期、周末或 public 假期(每个假期都有自己的时间行)时。
在
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
lag(ends_on) OVER (PARTITION BY 1 ORDER BY starts_on) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15' ) p
) p
) p;
我得到了什么
结果是
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 15 | {678,534,679}
534 | 2019-12-23 | 2020-01-04 | 0 | 15 | {678,534,679}
679 | 2019-12-28 | 2019-12-29 | 0 | 15 | {678,534,679}
9 | 2020-01-01 | 2020-01-01 | 1 | 1 | {9}
776 | 2020-01-04 | 2020-01-05 | 2 | 3 | {776,7}
7 | 2020-01-06 | 2020-01-06 | 2 | 3 | {776,7}
777 | 2020-01-11 | 2020-01-12 | 3 | 2 | {777}
前三行是 grp
0(ids 678、534 和 679)。
我想要的
但是 id 9、776 和 7 也应该属于那个 grp
。不幸的是它们重叠了。有没有可能得到这样的结果(我不关心顺序)?
id | starts_on | ends_on | grp | duration | array_agg
----+------------+------------+-----+----------+---------------
678 | 2019-12-21 | 2019-12-22 | 0 | 17 | {678,534,679,9,776,7}
534 | 2019-12-23 | 2020-01-04 | 0 | 17 | {678,534,679,9,776,7}
679 | 2019-12-28 | 2019-12-29 | 0 | 17 | {678,534,679,9,776,7}
9 | 2020-01-01 | 2020-01-01 | 0 | 17 | {678,534,679,9,776,7}
776 | 2020-01-04 | 2020-01-05 | 0 | 17 | {678,534,679,9,776,7}
7 | 2020-01-06 | 2020-01-06 | 0 | 17 | {678,534,679,9,776,7}
777 | 2020-01-11 | 2020-01-12 | 1 | 2 | {777}
我想知道整个岛屿 (grp 0) 的天数及其包含的周期 ID。
这是您的其他问题的一个有趣变体。问题是 lag()
只查看前一行来检查重叠。相反,您想查看前面的所有行。
幸运的是,您可以为此目的使用累积 max()
:
SELECT p.id, p.starts_on, p.ends_on, grp,
(Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp)
) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER
(PARTITION BY 1
ORDER BY starts_on
) AS grp
FROM (SELECT p.*,
MAX(ends_on) OVER (ORDER BY starts_on ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_eo
FROM (SELECT p.id, p.starts_on, p.ends_on
FROM periods p
WHERE starts_on > '2019-12-15' AND
starts_on < '2020-01-15'
) p
) p
) p;
我不确定 PARTITION BY 1
应该做什么,但我没有包含它。
Here 是一个 rextester。
预测您的下一个问题。这有一个挑战:如果开始时间相等,则累积最大值不稳定。在这种情况下,您要么想要删除重复项,要么使累积最大值的排序稳定。