Pipelinedb:如何在连续视图中按每 N 分钟对流数据进行分组
Pipelinedb: How to group stream data by each N minutes in continuous view
如何在 continuous view
select 中每 N 分钟对来自 pipelinedb stream
的数据进行分组?
Pipelinedb 的流获取有关来自许多远程主机的事件的数据。例如,我需要将这些事件按类型、ip 和时间间隔以 5 分钟为单位进行分组,并进行计数。
所以我的输入(非常粗略):
time | ip | type
------------------------------------
22:35 | 111.111.111.111 | page_open <-- new interaval, ends in 22:40
22:36 | 111.111.111.111 | page_open
22:37 | 111.111.111.111 | page_close
22:42 | 111.111.111.111 | page_close <-- event comes in next interval, ends in 22:45
22:42 | 222.111.111.111 | page_open
22:43 | 222.111.111.111 | page_open
22:44 | 222.111.111.111 | page_close
22:44 | 111.111.111.111 | page_open
以及必须持续查看的内容 select:
time | ip | type | count
---------------------------------------------
22:40 | 111.111.111.111 | page_open | 2
22:40 | 111.111.111.111 | page_close | 1
22:45 | 111.111.111.111 | page_open | 1
22:45 | 111.111.111.111 | page_close | 1
22:45 | 222.111.111.111 | page_open | 2
22:45 | 222.111.111.111 | page_close | 1
p.s。
对不起我的英语
您可以为此使用 date_round(column, interval)
[0] 函数。例如,
CREATE CONTINUOUS VIEW bucketed AS
SELECT date_round(time, '5 minutes') AS bucket, COUNT(*)
FROM input_stream GROUP BY bucket;
[0] http://docs.pipelinedb.com/builtin.html?highlight=date_round
如何在 continuous view
select 中每 N 分钟对来自 pipelinedb stream
的数据进行分组?
Pipelinedb 的流获取有关来自许多远程主机的事件的数据。例如,我需要将这些事件按类型、ip 和时间间隔以 5 分钟为单位进行分组,并进行计数。
所以我的输入(非常粗略):
time | ip | type
------------------------------------
22:35 | 111.111.111.111 | page_open <-- new interaval, ends in 22:40
22:36 | 111.111.111.111 | page_open
22:37 | 111.111.111.111 | page_close
22:42 | 111.111.111.111 | page_close <-- event comes in next interval, ends in 22:45
22:42 | 222.111.111.111 | page_open
22:43 | 222.111.111.111 | page_open
22:44 | 222.111.111.111 | page_close
22:44 | 111.111.111.111 | page_open
以及必须持续查看的内容 select:
time | ip | type | count
---------------------------------------------
22:40 | 111.111.111.111 | page_open | 2
22:40 | 111.111.111.111 | page_close | 1
22:45 | 111.111.111.111 | page_open | 1
22:45 | 111.111.111.111 | page_close | 1
22:45 | 222.111.111.111 | page_open | 2
22:45 | 222.111.111.111 | page_close | 1
p.s。 对不起我的英语
您可以为此使用 date_round(column, interval)
[0] 函数。例如,
CREATE CONTINUOUS VIEW bucketed AS
SELECT date_round(time, '5 minutes') AS bucket, COUNT(*)
FROM input_stream GROUP BY bucket;
[0] http://docs.pipelinedb.com/builtin.html?highlight=date_round