按彼此接近的时间戳进行分区(比如 30 分钟)
Partitioning by timestamps close to each other (say 30min)
我有一个数据集,我想按彼此接近的时间戳(比如小于 30 分钟)对其进行分区
Driver | Timestamp
A | 10/30/2019 05:02:28
A | 10/30/2019 05:05:28
A | 10/30/2019 05:09:28
A | 10/30/2019 05:12:28
A | 10/30/2019 07:54:28
A | 10/30/2019 07:57:28
A | 10/30/2019 08:02:28
A | 10/30/2019 12:14:28
A | 10/30/2019 12:17:28
A | 10/30/2019 12:22:28
我们如何像下面这样划分它:
id | Driver | Timestamp
1 | A | 10/30/2019 05:02:28
1 | A | 10/30/2019 05:05:28
1 | A | 10/30/2019 05:09:28
1 | A | 10/30/2019 05:12:28
2 | A | 10/30/2019 07:54:28
2 | A | 10/30/2019 07:57:28
2 | A | 10/30/2019 08:02:28
3 | A | 10/30/2019 12:14:28
3 | A | 10/30/2019 12:17:28
3 | A | 10/30/2019 12:22:28
非常感谢任何帮助,非常感谢!
这取决于你到底想要什么。
如果你想在两个连续时间戳之间有 30 分钟以上的间隔时进入一个新组,你可以使用 lag()
和累积 sum()
:
select
sum(case
when timestamp < lag_timestamp + interval '30' minute
then 0
else 1
end
) id,
driver,
timestamp
from (
select
t.*,
lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
from mytable t
) t
我认为您希望对每个驱动程序的数据进行会话处理。试试这个方法。它将 session_id 附加到其各自的驱动程序以创建特定于驱动程序的 session_id。
select
driver||sum(session_code) over (partition by driver order by timestamp) as session_id,
driver,
timestamp
from
(select
driver,
timestamp,
case when timestamp > lag(timestamp) over (partition by driver order by timestamp) + interval '1800' second
then 1 else 0 end as session_code
from your_table) a
检查您的版本是否支持sessionize
table运算符:
SELECT *
FROM Sessionize
( ON
(
SELECT *
FROM tab
)
PARTITION BY driver
ORDER BY ts
USING
TimeColumn('ts')
Timeout(1800)
)
我有一个数据集,我想按彼此接近的时间戳(比如小于 30 分钟)对其进行分区
Driver | Timestamp
A | 10/30/2019 05:02:28
A | 10/30/2019 05:05:28
A | 10/30/2019 05:09:28
A | 10/30/2019 05:12:28
A | 10/30/2019 07:54:28
A | 10/30/2019 07:57:28
A | 10/30/2019 08:02:28
A | 10/30/2019 12:14:28
A | 10/30/2019 12:17:28
A | 10/30/2019 12:22:28
我们如何像下面这样划分它:
id | Driver | Timestamp
1 | A | 10/30/2019 05:02:28
1 | A | 10/30/2019 05:05:28
1 | A | 10/30/2019 05:09:28
1 | A | 10/30/2019 05:12:28
2 | A | 10/30/2019 07:54:28
2 | A | 10/30/2019 07:57:28
2 | A | 10/30/2019 08:02:28
3 | A | 10/30/2019 12:14:28
3 | A | 10/30/2019 12:17:28
3 | A | 10/30/2019 12:22:28
非常感谢任何帮助,非常感谢!
这取决于你到底想要什么。
如果你想在两个连续时间戳之间有 30 分钟以上的间隔时进入一个新组,你可以使用 lag()
和累积 sum()
:
select
sum(case
when timestamp < lag_timestamp + interval '30' minute
then 0
else 1
end
) id,
driver,
timestamp
from (
select
t.*,
lag(timestamp) over(partition by driver order by timestamp) lag_timestamp
from mytable t
) t
我认为您希望对每个驱动程序的数据进行会话处理。试试这个方法。它将 session_id 附加到其各自的驱动程序以创建特定于驱动程序的 session_id。
select
driver||sum(session_code) over (partition by driver order by timestamp) as session_id,
driver,
timestamp
from
(select
driver,
timestamp,
case when timestamp > lag(timestamp) over (partition by driver order by timestamp) + interval '1800' second
then 1 else 0 end as session_code
from your_table) a
检查您的版本是否支持sessionize
table运算符:
SELECT *
FROM Sessionize
( ON
(
SELECT *
FROM tab
)
PARTITION BY driver
ORDER BY ts
USING
TimeColumn('ts')
Timeout(1800)
)