随后几天的块的最小和最大日期
Min and max date of blocks of subsequent days
我有一个 table,我需要从随后几天的块中找到最小日期和最大日期。我的样本 table 如下:
Id
Startdate
Enddate
1001
2017-06-01
2017-06-01
1001
2017-06-01
2017-06-10
1001
2017-06-02
2017-06-03
1001
2017-06-02
2017-06-02
1001
2017-06-04
2017-06-10
1001
2018-06-08
2018-06-08
1001
2018-06-09
2018-06-09
1001
2018-06-10
2018-06-10
1001
2018-06-11
2018-06-11
1001
2018-06-12
2018-06-12
1001
2018-06-13
2018-06-13
1001
2018-06-14
2018-06-14
1001
2018-06-15
2018-06-15
1001
2019-02-01
2019-02-03
1001
2019-02-01
2019-02-06
1001
2019-02-01
2019-02-01
1001
2019-02-02
2019-02-02
1001
2019-02-03
2019-02-03
1001
2019-02-04
2019-02-06
1001
2019-02-04
2019-02-04
1001
2019-02-05
2019-02-05
1001
2019-02-06
2019-02-06
1001
2019-05-23
2019-05-23
1001
2019-05-24
2019-05-24
预期输出:
Id
Startdate
Enddate
1001
2017-06-01
2017-06-10
1001
2018-06-08
2018-06-15
1001
2019-02-01
2019-02-06
1001
2019-05-23
2019-05-24
我知道这只能通过 PARTITION BY 来实现。我在下面尝试过,但它似乎没有用。请指教
SELECT ID,
MIN(STARTDATE) AS STARTDATE ,MAX(ENDDATE) AS ENDDATE
FROM (SELECT MEMBER_ID,STARTDATE,ENDDATE ,COUNT (IS_GAP) OVER (ORDER BY MEMBER_ID,STARTDATE) AS RANGE_ID
FROM (SELECT MEMBER_ID,STARTDATE,ENDDATE , CASE WHEN MAX (ENDDATE) OVER
(PARTITION BY MEMBER_ID ORDER BY MEMBER_ID, STARTDATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) < STARTDATE THEN TRUE END AS IS_GAP
FROM TABLE T ) T) T
GROUP BY RANGE_ID, MEMBER_ID
ORDER BY MEMBER_ID,STARTDATE;
提前致谢!
这是经典的时间间隔问题。
首先,我需要建立一些我将基于的时间集,在我的例子中是 _days 子查询。
然后我需要计算实例组,我使用 DENSE_RANK 和 DATEDIFF 函数来研究差异,这会创建组。
然后在这些组内进行汇总,我们就得到了结果。
示例数据:
CREATE OR REPLACE TABLE T1 (
Id INT,
Startdate DATE,
Enddate DATE
);
INSERT INTO T1(Id, Startdate, Enddate)
SELECT *
FROM VALUES
(1001, '2017-06-01', '2017-06-01'),
(1001, '2017-06-01', '2017-06-10'),
(1001, '2017-06-02', '2017-06-03'),
(1001, '2017-06-02', '2017-06-02'),
(1001, '2017-06-04', '2017-06-10'),
(1001, '2018-06-08', '2018-06-08'),
(1001, '2018-06-09', '2018-06-09'),
(1001, '2018-06-10', '2018-06-10'),
(1001, '2018-06-11', '2018-06-11'),
(1001, '2018-06-12', '2018-06-12'),
(1001, '2018-06-13', '2018-06-13'),
(1001, '2018-06-14', '2018-06-14'),
(1001, '2018-06-15', '2018-06-15'),
(1001, '2019-02-01', '2019-02-03'),
(1001, '2019-02-01', '2019-02-06'),
(1001, '2019-02-01', '2019-02-01'),
(1001, '2019-02-02', '2019-02-02'),
(1001, '2019-02-03', '2019-02-03'),
(1001, '2019-02-04', '2019-02-06'),
(1001, '2019-02-04', '2019-02-04'),
(1001, '2019-02-05', '2019-02-05'),
(1001, '2019-02-06', '2019-02-06'),
(1001, '2019-05-23', '2019-05-23'),
(1001, '2019-05-24', '2019-05-24') t(Id, Startdate, Enddate);
解决方案:
SET MINDATE = (SELECT MIN(Startdate) FROM T1);
SET MAXDATE = (SELECT MAX(Enddate) FROM T1);
SET DIFFDAYS = (SELECT DATEDIFF(DAY, $MINDATE, $MAXDATE)+1);
WITH _days AS (
SELECT DATEADD(DAY, SEQ4(), $MINDATE) AS Day
FROM TABLE(GENERATOR(ROWCOUNT => $DIFFDAYS))
), _grps AS (
SELECT *
, DATEDIFF(DAY, $MINDATE, D.Day) - DENSE_RANK() OVER(PARTITION BY T1.Id ORDER BY D.Day) AS grp
FROM _days AS D
JOIN T1 ON D.Day BETWEEN T1.Startdate AND T1.Enddate
)
SELECT ID
, MIN(Day) AS Startdate
, MAX(Day) AS Enddate
FROM _grps
GROUP BY Id, grp;
我有一个 table,我需要从随后几天的块中找到最小日期和最大日期。我的样本 table 如下:
Id | Startdate | Enddate |
---|---|---|
1001 | 2017-06-01 | 2017-06-01 |
1001 | 2017-06-01 | 2017-06-10 |
1001 | 2017-06-02 | 2017-06-03 |
1001 | 2017-06-02 | 2017-06-02 |
1001 | 2017-06-04 | 2017-06-10 |
1001 | 2018-06-08 | 2018-06-08 |
1001 | 2018-06-09 | 2018-06-09 |
1001 | 2018-06-10 | 2018-06-10 |
1001 | 2018-06-11 | 2018-06-11 |
1001 | 2018-06-12 | 2018-06-12 |
1001 | 2018-06-13 | 2018-06-13 |
1001 | 2018-06-14 | 2018-06-14 |
1001 | 2018-06-15 | 2018-06-15 |
1001 | 2019-02-01 | 2019-02-03 |
1001 | 2019-02-01 | 2019-02-06 |
1001 | 2019-02-01 | 2019-02-01 |
1001 | 2019-02-02 | 2019-02-02 |
1001 | 2019-02-03 | 2019-02-03 |
1001 | 2019-02-04 | 2019-02-06 |
1001 | 2019-02-04 | 2019-02-04 |
1001 | 2019-02-05 | 2019-02-05 |
1001 | 2019-02-06 | 2019-02-06 |
1001 | 2019-05-23 | 2019-05-23 |
1001 | 2019-05-24 | 2019-05-24 |
预期输出:
Id | Startdate | Enddate |
---|---|---|
1001 | 2017-06-01 | 2017-06-10 |
1001 | 2018-06-08 | 2018-06-15 |
1001 | 2019-02-01 | 2019-02-06 |
1001 | 2019-05-23 | 2019-05-24 |
我知道这只能通过 PARTITION BY 来实现。我在下面尝试过,但它似乎没有用。请指教
SELECT ID,
MIN(STARTDATE) AS STARTDATE ,MAX(ENDDATE) AS ENDDATE
FROM (SELECT MEMBER_ID,STARTDATE,ENDDATE ,COUNT (IS_GAP) OVER (ORDER BY MEMBER_ID,STARTDATE) AS RANGE_ID
FROM (SELECT MEMBER_ID,STARTDATE,ENDDATE , CASE WHEN MAX (ENDDATE) OVER
(PARTITION BY MEMBER_ID ORDER BY MEMBER_ID, STARTDATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) < STARTDATE THEN TRUE END AS IS_GAP
FROM TABLE T ) T) T
GROUP BY RANGE_ID, MEMBER_ID
ORDER BY MEMBER_ID,STARTDATE;
提前致谢!
这是经典的时间间隔问题。
首先,我需要建立一些我将基于的时间集,在我的例子中是 _days 子查询。
然后我需要计算实例组,我使用 DENSE_RANK 和 DATEDIFF 函数来研究差异,这会创建组。
然后在这些组内进行汇总,我们就得到了结果。
示例数据:
CREATE OR REPLACE TABLE T1 (
Id INT,
Startdate DATE,
Enddate DATE
);
INSERT INTO T1(Id, Startdate, Enddate)
SELECT *
FROM VALUES
(1001, '2017-06-01', '2017-06-01'),
(1001, '2017-06-01', '2017-06-10'),
(1001, '2017-06-02', '2017-06-03'),
(1001, '2017-06-02', '2017-06-02'),
(1001, '2017-06-04', '2017-06-10'),
(1001, '2018-06-08', '2018-06-08'),
(1001, '2018-06-09', '2018-06-09'),
(1001, '2018-06-10', '2018-06-10'),
(1001, '2018-06-11', '2018-06-11'),
(1001, '2018-06-12', '2018-06-12'),
(1001, '2018-06-13', '2018-06-13'),
(1001, '2018-06-14', '2018-06-14'),
(1001, '2018-06-15', '2018-06-15'),
(1001, '2019-02-01', '2019-02-03'),
(1001, '2019-02-01', '2019-02-06'),
(1001, '2019-02-01', '2019-02-01'),
(1001, '2019-02-02', '2019-02-02'),
(1001, '2019-02-03', '2019-02-03'),
(1001, '2019-02-04', '2019-02-06'),
(1001, '2019-02-04', '2019-02-04'),
(1001, '2019-02-05', '2019-02-05'),
(1001, '2019-02-06', '2019-02-06'),
(1001, '2019-05-23', '2019-05-23'),
(1001, '2019-05-24', '2019-05-24') t(Id, Startdate, Enddate);
解决方案:
SET MINDATE = (SELECT MIN(Startdate) FROM T1);
SET MAXDATE = (SELECT MAX(Enddate) FROM T1);
SET DIFFDAYS = (SELECT DATEDIFF(DAY, $MINDATE, $MAXDATE)+1);
WITH _days AS (
SELECT DATEADD(DAY, SEQ4(), $MINDATE) AS Day
FROM TABLE(GENERATOR(ROWCOUNT => $DIFFDAYS))
), _grps AS (
SELECT *
, DATEDIFF(DAY, $MINDATE, D.Day) - DENSE_RANK() OVER(PARTITION BY T1.Id ORDER BY D.Day) AS grp
FROM _days AS D
JOIN T1 ON D.Day BETWEEN T1.Startdate AND T1.Enddate
)
SELECT ID
, MIN(Day) AS Startdate
, MAX(Day) AS Enddate
FROM _grps
GROUP BY Id, grp;