SQL 将日期缩小到范围 "start - end"
SQL reduce dates to range "start - end"
我在 table 中有多个日期的重复行:
ID STATE DATE
----------------------------
id01 connected 2015-04-04
id01 connected 2015-04-05
id01 connected 2015-04-08
id01 disconect 2015-04-11
id01 disconect 2015-04-12
id01 connected 2015-04-13
我想要一个包含 "start date" 和 "end date" 的查询,结果如下:
ID STATE START DATE END DATE
----------------------------------------
id01 connected 2015-04-04 2015-04-10
id01 disconect 2015-04-11 2015-04-12
id01 connected 2015-04-13 XXXXXXXXXX
最后一个"end date"不重要(last value, null, now()...)
最重要的是检测更改日期(在此示例中,2015 年 4 月 10 日没有行,2015 年 4 月 13 日发生了相同的状态)。
可能的解决方案? (无效)
SELECT ID, STATE, MIN(date), MAX(date)
FROM TABLE
GROUP BY ID, STATE;
无效,因为合并间隔:
ID STATE START DATE END DATE
----------------------------------------
id01 connected 2015-04-04 XXXXXXXXXX
id01 disconect 2015-04-11 2015-04-12
查询在Impala中有运行(类似SQL92)
Impala 支持 window 函数。这个问题是"gap-and-islands"问题,所以可以通过行数的不同来解决:
select id, state, min(date) as start_date, max(date) as end_date
from (select t.*,
row_number() over (partition by id order by date) as seqnum_id,
row_number() over (partition by id, state order by date) as seqnum_isd
from table t
) t
group by id, state, (seqnum_id - seqnum_isd);
区别的逻辑并不难,但是当你第一次学习它时会很棘手。它有助于 运行 子查询并查看行号值是什么——以及为什么差异定义了每个组。
(代表OP发表).
来自,将"gap-and-islands"问题转化为我的研究案例,有解决方案:
select
id,
state,
start_date,
date_add(lag(start_date, 1) over (partition by id order by start_date desc), -1) as end_date
from
(select id, state, min(date) as start_date, max(date) as end_date
from (select t.*,
row_number() over (partition by id order by date) as seqnum_id,
row_number() over (partition by id, state order by date) as seqnum_isd
from test t
) t
group by id, state, (seqnum_id - seqnum_isd)) t_range
order by start_date;
我在 table 中有多个日期的重复行:
ID STATE DATE
----------------------------
id01 connected 2015-04-04
id01 connected 2015-04-05
id01 connected 2015-04-08
id01 disconect 2015-04-11
id01 disconect 2015-04-12
id01 connected 2015-04-13
我想要一个包含 "start date" 和 "end date" 的查询,结果如下:
ID STATE START DATE END DATE
----------------------------------------
id01 connected 2015-04-04 2015-04-10
id01 disconect 2015-04-11 2015-04-12
id01 connected 2015-04-13 XXXXXXXXXX
最后一个"end date"不重要(last value, null, now()...)
最重要的是检测更改日期(在此示例中,2015 年 4 月 10 日没有行,2015 年 4 月 13 日发生了相同的状态)。
可能的解决方案? (无效)
SELECT ID, STATE, MIN(date), MAX(date)
FROM TABLE
GROUP BY ID, STATE;
无效,因为合并间隔:
ID STATE START DATE END DATE
----------------------------------------
id01 connected 2015-04-04 XXXXXXXXXX
id01 disconect 2015-04-11 2015-04-12
查询在Impala中有运行(类似SQL92)
Impala 支持 window 函数。这个问题是"gap-and-islands"问题,所以可以通过行数的不同来解决:
select id, state, min(date) as start_date, max(date) as end_date
from (select t.*,
row_number() over (partition by id order by date) as seqnum_id,
row_number() over (partition by id, state order by date) as seqnum_isd
from table t
) t
group by id, state, (seqnum_id - seqnum_isd);
区别的逻辑并不难,但是当你第一次学习它时会很棘手。它有助于 运行 子查询并查看行号值是什么——以及为什么差异定义了每个组。
(代表OP发表).
来自
select
id,
state,
start_date,
date_add(lag(start_date, 1) over (partition by id order by start_date desc), -1) as end_date
from
(select id, state, min(date) as start_date, max(date) as end_date
from (select t.*,
row_number() over (partition by id order by date) as seqnum_id,
row_number() over (partition by id, state order by date) as seqnum_isd
from test t
) t
group by id, state, (seqnum_id - seqnum_isd)) t_range
order by start_date;