如何对 BigQuery 中的重复 window 进行聚合

How to take aggregations for a repeating window in BigQuery

我有一个数据集,其中有一个包含步骤的 id。每个步骤都有一个时间戳和通道名称。对于给定的 id,频道可以在多个时间戳重复多次。

我正在尝试测量重复频道的每个块(按时间戳排序),发生了多少次?

这是我的示例数据 -

with temp as (
select 1 as id, '2019-08-02 13:13:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:14:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:15:27 UTC' as t_date, 'display' as channel union all
select 1 as id, '2019-08-02 13:16:27 UTC' as t_date, 'display' as channel union all
select 1 as id, '2019-08-02 13:17:27 UTC' as t_date, 'email' as channel union all
select 1 as id, '2019-08-02 13:18:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:11:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:12:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:13:27 UTC' as t_date, 'email' as channel union all
select 2 as id, '2019-08-02 13:14:27 UTC' as t_date, 'email' as channel 
)

select id, channel , count(1) appearances
from temp
group by id , channel
order by id

这让我输出为

但是,我需要这样的东西 -

如输出所示,对于同时出现的每个通道序列,我需要计算 appearances 以及开始和结束时间。例如,输出中的第一条记录属于 email 通道,它从 id = 1 开始于 2019-08-02 13:13:27 UTC 并结束于 2019-08-02 13:14:27 UTC - 按时间戳排序。最后一列显示多少次 email 频道在切换到下一个频道之前重复(在本例中显示)。

如何在 BigQuery 中实现这一点?

考虑以下方法

select id, channel, 
  min(t_date) as start_date, 
  max(t_date) as end_date, 
  count(1) as appearances
from (
  select *, countif(new_group) over (partition by id order by t_date) group_id 
  from (
    select *, ifnull(channel != lag(channel) over win, true) new_group
    from temp
    window win as (partition by id order by t_date)
  )
)
group by id, channel, group_id               

如果应用于您问题中的示例数据 - 输出为