按彼此接近的时间戳进行分区(比如 30 分钟)

Partitioning by timestamps close to each other (say 30min)

我有一个数据集,我想按彼此接近的时间戳(比如小于 30 分钟)对其进行分区

Driver | Timestamp
A      | 10/30/2019 05:02:28
A      | 10/30/2019 05:05:28
A      | 10/30/2019 05:09:28
A      | 10/30/2019 05:12:28
A      | 10/30/2019 07:54:28
A      | 10/30/2019 07:57:28
A      | 10/30/2019 08:02:28
A      | 10/30/2019 12:14:28
A      | 10/30/2019 12:17:28
A      | 10/30/2019 12:22:28

我们如何像下面这样划分它:

id     | Driver    |    Timestamp
1      |    A      | 10/30/2019 05:02:28
1      |    A      | 10/30/2019 05:05:28
1      |    A      | 10/30/2019 05:09:28
1      |    A      | 10/30/2019 05:12:28
2      |    A      | 10/30/2019 07:54:28
2      |    A      | 10/30/2019 07:57:28
2      |    A      | 10/30/2019 08:02:28
3      |    A      | 10/30/2019 12:14:28
3      |    A      | 10/30/2019 12:17:28
3      |    A      | 10/30/2019 12:22:28

非常感谢任何帮助,非常感谢!

这取决于你到底想要什么。

如果你想在两个连续时间戳之间有 30 分钟以上的间隔时进入一个新组,你可以使用 lag() 和累积 sum():

select
    sum(case 
        when timestamp < lag_timestamp + interval '30' minute
            then 0
            else 1
        end
    ) id,
    driver,
    timestamp
from (
    select
        t.*,
        lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
    from mytable t
) t

我认为您希望对每个驱动程序的数据进行会话处理。试试这个方法。它将 session_id 附加到其各自的驱动程序以创建特定于驱动程序的 session_id。

select 
   driver||sum(session_code) over (partition by driver order by timestamp) as session_id,
   driver,
   timestamp
from 
   (select 
      driver,
      timestamp, 
      case when timestamp > lag(timestamp) over (partition by driver order by timestamp) + interval '1800' second 
          then 1 else 0 end as session_code
    from your_table) a

检查您的版本是否支持sessionize table运算符:

SELECT * 
FROM Sessionize
 ( ON
    (
      SELECT *
      FROM tab
    )
   PARTITION BY driver
   ORDER BY ts
   USING
     TimeColumn('ts')
     Timeout(1800)
 )