确定距离下一个 ID 有多远
Determining how far away is the next ID
所以我有一些数据,子集如下:
ID data start_time
001 X 2021-12-29 10:54:12.429 +0000
002 Y 2022-01-16 05:07:55.708 +0000
003 Y 2021-12-31 12:25:12.980 +0000
002 A 2022-01-03 12:49:41.866 +0000
001 A 2021-12-30 16:32:13.736 +0000
001 A 2022-01-17 10:10:10.736 +0000
我想以分钟为单位确定数据帧中给定 ID
和 下一次出现 之间的时间差,顺序为 start_time
.因此,如果 ID
出现在 12:00 和 12:01,我希望 ID
显示下一个条目的时间以及以分钟为单位的差异,使用 SQL/Snowflake。首选 CTE。
应添加以下字段:
next_timestamp
: 后面条目的时间戳
time_diff
: 分钟 start_time
和 next_timestamp
之间的差异。
entry_order
:这个ID
已经多少了
预期输出:
ID data start_time next_timestamp time_diff entry_order
001 X 2021-12-29 10:54:12.429 +0000 2021-12-30 16:32:13.736 +0000 1778 1
001 A 2021-12-30 16:32:13.736 +0000 2022-01-17 10:10:10.736 +0000 25537 2
003 Y 2021-12-31 12:25:12.980 +0000 NULL NULL 1
002 A 2022-01-03 12:49:41.866 +0000 2022-01-16 05:07:55.708 +0000 18258 1
002 Y 2022-01-16 05:07:55.708 +0000 NULL NULL 2
001 A 2022-01-17 10:10:10.736 +0000 NULL NULL 3
注意,结果输出按时间戳升序排列。
使用 LEAD
、DATEDIFF
和 ROW_NUMBER
:
SELECT *,
LEAD(start_time) OVER(PARTITITON BY ID ORDER BY start_time) AS next_timestamp,
DATEDIFF(seconds, start_time, next_timestamp) SA time_difference,
ROW_NUMBER() OVER(PARTITITON BY ID ORDER BY start_time) AS entry_order
FROM tab
LEAD 函数可用于查找每个 ID 的下一个 start_time。
并且 ROW_NUMBER 函数可以 return 每个 ID 的唯一序列号。
SELECT *
, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time) AS next_timestamp
, DATEDIFF(minute, start_time, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time)) AS time_diff
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY start_time) AS entry_order
FROM your_table
ORDER BY start_time
所以我有一些数据,子集如下:
ID data start_time
001 X 2021-12-29 10:54:12.429 +0000
002 Y 2022-01-16 05:07:55.708 +0000
003 Y 2021-12-31 12:25:12.980 +0000
002 A 2022-01-03 12:49:41.866 +0000
001 A 2021-12-30 16:32:13.736 +0000
001 A 2022-01-17 10:10:10.736 +0000
我想以分钟为单位确定数据帧中给定 ID
和 下一次出现 之间的时间差,顺序为 start_time
.因此,如果 ID
出现在 12:00 和 12:01,我希望 ID
显示下一个条目的时间以及以分钟为单位的差异,使用 SQL/Snowflake。首选 CTE。
应添加以下字段:
next_timestamp
: 后面条目的时间戳time_diff
: 分钟start_time
和next_timestamp
之间的差异。entry_order
:这个ID
已经多少了
预期输出:
ID data start_time next_timestamp time_diff entry_order
001 X 2021-12-29 10:54:12.429 +0000 2021-12-30 16:32:13.736 +0000 1778 1
001 A 2021-12-30 16:32:13.736 +0000 2022-01-17 10:10:10.736 +0000 25537 2
003 Y 2021-12-31 12:25:12.980 +0000 NULL NULL 1
002 A 2022-01-03 12:49:41.866 +0000 2022-01-16 05:07:55.708 +0000 18258 1
002 Y 2022-01-16 05:07:55.708 +0000 NULL NULL 2
001 A 2022-01-17 10:10:10.736 +0000 NULL NULL 3
注意,结果输出按时间戳升序排列。
使用 LEAD
、DATEDIFF
和 ROW_NUMBER
:
SELECT *,
LEAD(start_time) OVER(PARTITITON BY ID ORDER BY start_time) AS next_timestamp,
DATEDIFF(seconds, start_time, next_timestamp) SA time_difference,
ROW_NUMBER() OVER(PARTITITON BY ID ORDER BY start_time) AS entry_order
FROM tab
LEAD 函数可用于查找每个 ID 的下一个 start_time。
并且 ROW_NUMBER 函数可以 return 每个 ID 的唯一序列号。
SELECT *
, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time) AS next_timestamp
, DATEDIFF(minute, start_time, LEAD(start_time) OVER (PARTITION BY ID ORDER BY start_time)) AS time_diff
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY start_time) AS entry_order
FROM your_table
ORDER BY start_time