对 SQL 中的分类数据应用模式操作

Apply mode operation on categorical data in SQL

我在 bigquery 数据库中有分类日志数据,我想根据滑动 window 进行处理。我想对大小为 3 或 5 的 window 应用 MODE 操作,以便丢弃一次性事件或类别更改。

|SysDT | Power_State | Target |
| -------- | -------- | -------- |
|2021-07-01 09:03:57+00:00| EDC | EDC   |
|2021-07-01 09:08:57+00:00| EDC | EDC   |
|2021-07-01 09:13:57+00:00| DWN | EDC   |
|2021-07-01 09:18:57+00:00| EDC | EDC   |
|2021-07-01 09:23:58+00:00| EDC | EDC   |
|2021-07-01 09:28:59+00:00| DWN | EDC   |
|2021-07-01 09:33:59+00:00| EDC | EDC   |

我尝试使用 OVER 运算符来提供所需的滑动 window 但接下来我需要一个自定义 MODE 运算符。修改此查询以避免此类 MODE 函数或在 bigquery 中编写自定义 MODE 函数有什么想法吗?

SELECT *, MODE(Power_State) 
    OVER(ORDER BY SysDT ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as Target
FROM Master_Data.2021_07
ORDER BY SysDT

非常感谢任何帮助。谢谢

考虑以下方法

select sysdt, power_state,
  ( select approx_top_count(state, 1)[offset(0)].value
    from unnest(arr) state
  ) as target
from (
  select *, array_agg(power_state) over win arr
  from your_table
  window win as (order by sysdt rows between 1 preceding and 1 following)
)
-- order by sysdt      

如果应用于您问题中的示例数据

with your_table as (
  select '2021-07-01 09:03:57+00:00' sysdt, 'EDC' power_state union all 
  select '2021-07-01 09:08:57+00:00', 'EDC' union all 
  select '2021-07-01 09:13:57+00:00', 'DWN' union all 
  select '2021-07-01 09:18:57+00:00', 'EDC' union all 
  select '2021-07-01 09:23:58+00:00', 'EDC' union all 
  select '2021-07-01 09:28:59+00:00', 'DWN' union all 
  select '2021-07-01 09:33:59+00:00', 'EDC' 
)           

输出是