Impala SQL: 合并具有重叠日期的行。不支持 WHERE EXISTS 和递归 CTE
Impala SQL: Merging rows with overlapping dates. WHERE EXISTS and recursive CTE not supported
我正在尝试在 Impala SQL 中的 table 中合并具有重叠日期间隔的行。但是,Impala 例如不支持我找到的解决方案。 WHERE EXISTS 和递归 CTE。
如何在 Impala 中为此编写查询?
Table: @T
ID StartDate EndDate
1 20170101 20170201
2 20170101 20170401
3 20170505 20170531
4 20170530 20170531
5 20170530 20170831
6 20171001 20171005
7 20171101 20171225
8 20171105 20171110
Required Output:
StartDate EndDate
20170101 20170401
20170505 20170831
20171001 20171005
Impala 不支持我试图实现的示例:
SELECT
s1.StartDate,
MIN(t1.EndDate) AS EndDate
FROM @T s1
INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate
AND NOT EXISTS(SELECT * FROM @T t2
WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate)
WHERE NOT EXISTS(SELECT * FROM @T s2
WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate)
GROUP BY s1.StartDate
ORDER BY s1.StartDate
类似问题:
Merge overlapping date intervals
Eliminate and reduce overlapping date ranges
https://gerireshef.wordpress.com/2010/05/02/packing-date-intervals/
https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx
select min(StartDate) as StartDate
,max(EndDate) as EndDate
from (select StartDate,EndDate
,count (is_gap) over
(
order by StartDate,ID
) as range_id
from (select ID,StartDate,EndDate
,case
when max (EndDate) over
(
order by StartDate,ID
rows between unbounded preceding
and 1 preceding
) < StartDate
then true
end as is_gap
from t
) t
) t
group by range_id
order by StartDate
;
+------------+------------+
| startdate | enddate |
+------------+------------+
| 2017-01-01 | 2017-04-01 |
| 2017-05-05 | 2017-08-31 |
| 2017-10-01 | 2017-10-05 |
| 2017-11-01 | 2017-12-25 |
+------------+------------+
我正在尝试在 Impala SQL 中的 table 中合并具有重叠日期间隔的行。但是,Impala 例如不支持我找到的解决方案。 WHERE EXISTS 和递归 CTE。
如何在 Impala 中为此编写查询?
Table: @T
ID StartDate EndDate
1 20170101 20170201
2 20170101 20170401
3 20170505 20170531
4 20170530 20170531
5 20170530 20170831
6 20171001 20171005
7 20171101 20171225
8 20171105 20171110
Required Output:
StartDate EndDate
20170101 20170401
20170505 20170831
20171001 20171005
Impala 不支持我试图实现的示例:
SELECT
s1.StartDate,
MIN(t1.EndDate) AS EndDate
FROM @T s1
INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate
AND NOT EXISTS(SELECT * FROM @T t2
WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate)
WHERE NOT EXISTS(SELECT * FROM @T s2
WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate)
GROUP BY s1.StartDate
ORDER BY s1.StartDate
类似问题:
Merge overlapping date intervals
Eliminate and reduce overlapping date ranges
https://gerireshef.wordpress.com/2010/05/02/packing-date-intervals/
https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx
select min(StartDate) as StartDate
,max(EndDate) as EndDate
from (select StartDate,EndDate
,count (is_gap) over
(
order by StartDate,ID
) as range_id
from (select ID,StartDate,EndDate
,case
when max (EndDate) over
(
order by StartDate,ID
rows between unbounded preceding
and 1 preceding
) < StartDate
then true
end as is_gap
from t
) t
) t
group by range_id
order by StartDate
;
+------------+------------+
| startdate | enddate |
+------------+------------+
| 2017-01-01 | 2017-04-01 |
| 2017-05-05 | 2017-08-31 |
| 2017-10-01 | 2017-10-05 |
| 2017-11-01 | 2017-12-25 |
+------------+------------+