SQL Shift Timeseries(获取多个时间戳的后续行)

SQL Shift Timeseries (Get Following Row For Multiple Timestamps)

我有一个如下所示的数据库:

timestamp           | entity_id
--------------------+----------
2021-12-01 10:00:00 | A
2021-12-01 09:00:00 | A
2021-12-01 08:00:01 | A
2021-12-01 08:00:00 | B
2021-12-01 07:00:00 | A

timestampUNIQUE,但我事先不知道不同的时间戳相隔多远。我怎样才能写出能得到以下结果的语句?

entity_id | following_entity_id | count
----------+---------------------+------
A         | A                   | 2
A         | B                   | 1
B         | A                   | 1

当使用 pandas 时,我可能会使用它的 shift 方法,但在这种情况下我需要使用原始 SQL 来做到这一点。

你需要LAG()window函数来获取前一个(按时间顺序)entity_id的值(或者LEAD()window函数来获取后面的值) 然后聚合:

SELECT entity_id, following_entity_id, COUNT(*) count
FROM (
  SELECT *, LAG(entity_id) OVER (ORDER BY timestamp) following_entity_id
  FROM tablename       
)
WHERE following_entity_id IS NOT NULL
GROUP BY entity_id, following_entity_id;

参见demo