对于 SQL,如何在时间序列中出现长时间间隔后删除案例?
with SQL, how to drop cases after a long gap in a time series?
我的数据看起来像这样:
CASE_TIMESTAMP
GROUP
0
2017-12-26 16:12:09+00:00
A
1
2017-12-26 16:12:44+00:00
A
2
2020-04-21 07:00:00+00:00
A
3
2020-07-01 00:05:35+00:00
A
4
2020-08-06 07:00:00+00:00
A
5
2020-08-06 07:00:00+00:00
A
6
2020-08-06 07:00:00+00:00
A
7
2020-08-25 07:00:00+00:00
B
8
2020-09-22 07:00:00+00:00
B
9
2020-09-22 07:00:00+00:00
B
10
2020-12-04 08:00:00+00:00
B
11
2020-12-04 08:00:00+00:00
B
12
2020-12-07 08:00:00+00:00
B
13
2020-12-07 08:00:00+00:00
B
14
2020-12-07 08:00:00+00:00
B
15
2020-12-08 08:00:00+00:00
B
16
2020-12-08 08:00:00+00:00
B
17
2020-12-08 08:00:00+00:00
B
需要删除间隔超过一天之前发生的案例,因此在组 a 中,所有案例在 2020-08-06 之前,在组 B 中,所有案例在 2020-12-07 之前。
想我需要一个 window 函数,但不知道如何计算间隙然后删除所有之前,有什么想法吗?
PS.I在雪花上
使用 QUALIFY 和窗口化 MAX 查找每个 GRR 的最新 CASE_TIMESTAMP:
CREATE TABLE t(CASE_TIMESTAMP TIMESTAMP, GRP VARCHAR)
AS
SELECT '2017-12-26 16:12:09+00:00','A'
UNION ALL SELECT '2017-12-26 16:12:44+00:00','A'
UNION ALL SELECT '2020-04-21 07:00:00+00:00','A'
UNION ALL SELECT '2020-07-01 00:05:35+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-25 07:00:00+00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00+00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00+00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B';
查询:
SELECT *
FROM t
QUALIFY CASE_TIMESTAMP >= MAX(CASE_TIMESTAMP) OVER(PARTITION BY GRP)
- INTERVAL '1 days';
输出:
我的数据看起来像这样:
CASE_TIMESTAMP | GROUP | |
---|---|---|
0 | 2017-12-26 16:12:09+00:00 | A |
1 | 2017-12-26 16:12:44+00:00 | A |
2 | 2020-04-21 07:00:00+00:00 | A |
3 | 2020-07-01 00:05:35+00:00 | A |
4 | 2020-08-06 07:00:00+00:00 | A |
5 | 2020-08-06 07:00:00+00:00 | A |
6 | 2020-08-06 07:00:00+00:00 | A |
7 | 2020-08-25 07:00:00+00:00 | B |
8 | 2020-09-22 07:00:00+00:00 | B |
9 | 2020-09-22 07:00:00+00:00 | B |
10 | 2020-12-04 08:00:00+00:00 | B |
11 | 2020-12-04 08:00:00+00:00 | B |
12 | 2020-12-07 08:00:00+00:00 | B |
13 | 2020-12-07 08:00:00+00:00 | B |
14 | 2020-12-07 08:00:00+00:00 | B |
15 | 2020-12-08 08:00:00+00:00 | B |
16 | 2020-12-08 08:00:00+00:00 | B |
17 | 2020-12-08 08:00:00+00:00 | B |
需要删除间隔超过一天之前发生的案例,因此在组 a 中,所有案例在 2020-08-06 之前,在组 B 中,所有案例在 2020-12-07 之前。
想我需要一个 window 函数,但不知道如何计算间隙然后删除所有之前,有什么想法吗?
PS.I在雪花上
使用 QUALIFY 和窗口化 MAX 查找每个 GRR 的最新 CASE_TIMESTAMP:
CREATE TABLE t(CASE_TIMESTAMP TIMESTAMP, GRP VARCHAR)
AS
SELECT '2017-12-26 16:12:09+00:00','A'
UNION ALL SELECT '2017-12-26 16:12:44+00:00','A'
UNION ALL SELECT '2020-04-21 07:00:00+00:00','A'
UNION ALL SELECT '2020-07-01 00:05:35+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-06 07:00:00+00:00','A'
UNION ALL SELECT '2020-08-25 07:00:00+00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00+00:00','B'
UNION ALL SELECT '2020-09-22 07:00:00+00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-04 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-07 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B'
UNION ALL SELECT '2020-12-08 08:00:00+00:00','B';
查询:
SELECT *
FROM t
QUALIFY CASE_TIMESTAMP >= MAX(CASE_TIMESTAMP) OVER(PARTITION BY GRP)
- INTERVAL '1 days';
输出: