SQL Shift Timeseries(获取多个时间戳的后续行)
SQL Shift Timeseries (Get Following Row For Multiple Timestamps)
我有一个如下所示的数据库:
timestamp | entity_id
--------------------+----------
2021-12-01 10:00:00 | A
2021-12-01 09:00:00 | A
2021-12-01 08:00:01 | A
2021-12-01 08:00:00 | B
2021-12-01 07:00:00 | A
timestamp
是 UNIQUE
,但我事先不知道不同的时间戳相隔多远。我怎样才能写出能得到以下结果的语句?
entity_id | following_entity_id | count
----------+---------------------+------
A | A | 2
A | B | 1
B | A | 1
当使用 pandas 时,我可能会使用它的 shift
方法,但在这种情况下我需要使用原始 SQL 来做到这一点。
你需要LAG()
window函数来获取前一个(按时间顺序)entity_id
的值(或者LEAD()
window函数来获取后面的值) 然后聚合:
SELECT entity_id, following_entity_id, COUNT(*) count
FROM (
SELECT *, LAG(entity_id) OVER (ORDER BY timestamp) following_entity_id
FROM tablename
)
WHERE following_entity_id IS NOT NULL
GROUP BY entity_id, following_entity_id;
参见demo。
我有一个如下所示的数据库:
timestamp | entity_id
--------------------+----------
2021-12-01 10:00:00 | A
2021-12-01 09:00:00 | A
2021-12-01 08:00:01 | A
2021-12-01 08:00:00 | B
2021-12-01 07:00:00 | A
timestamp
是 UNIQUE
,但我事先不知道不同的时间戳相隔多远。我怎样才能写出能得到以下结果的语句?
entity_id | following_entity_id | count
----------+---------------------+------
A | A | 2
A | B | 1
B | A | 1
当使用 pandas 时,我可能会使用它的 shift
方法,但在这种情况下我需要使用原始 SQL 来做到这一点。
你需要LAG()
window函数来获取前一个(按时间顺序)entity_id
的值(或者LEAD()
window函数来获取后面的值) 然后聚合:
SELECT entity_id, following_entity_id, COUNT(*) count
FROM (
SELECT *, LAG(entity_id) OVER (ORDER BY timestamp) following_entity_id
FROM tablename
)
WHERE following_entity_id IS NOT NULL
GROUP BY entity_id, following_entity_id;
参见demo。