对 SQL 中的分类数据应用模式操作
Apply mode operation on categorical data in SQL
我在 bigquery 数据库中有分类日志数据,我想根据滑动 window 进行处理。我想对大小为 3 或 5 的 window 应用 MODE 操作,以便丢弃一次性事件或类别更改。
|SysDT | Power_State | Target |
| -------- | -------- | -------- |
|2021-07-01 09:03:57+00:00| EDC | EDC |
|2021-07-01 09:08:57+00:00| EDC | EDC |
|2021-07-01 09:13:57+00:00| DWN | EDC |
|2021-07-01 09:18:57+00:00| EDC | EDC |
|2021-07-01 09:23:58+00:00| EDC | EDC |
|2021-07-01 09:28:59+00:00| DWN | EDC |
|2021-07-01 09:33:59+00:00| EDC | EDC |
我尝试使用 OVER 运算符来提供所需的滑动 window 但接下来我需要一个自定义 MODE 运算符。修改此查询以避免此类 MODE 函数或在 bigquery 中编写自定义 MODE 函数有什么想法吗?
SELECT *, MODE(Power_State)
OVER(ORDER BY SysDT ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as Target
FROM Master_Data.2021_07
ORDER BY SysDT
非常感谢任何帮助。谢谢
考虑以下方法
select sysdt, power_state,
( select approx_top_count(state, 1)[offset(0)].value
from unnest(arr) state
) as target
from (
select *, array_agg(power_state) over win arr
from your_table
window win as (order by sysdt rows between 1 preceding and 1 following)
)
-- order by sysdt
如果应用于您问题中的示例数据
with your_table as (
select '2021-07-01 09:03:57+00:00' sysdt, 'EDC' power_state union all
select '2021-07-01 09:08:57+00:00', 'EDC' union all
select '2021-07-01 09:13:57+00:00', 'DWN' union all
select '2021-07-01 09:18:57+00:00', 'EDC' union all
select '2021-07-01 09:23:58+00:00', 'EDC' union all
select '2021-07-01 09:28:59+00:00', 'DWN' union all
select '2021-07-01 09:33:59+00:00', 'EDC'
)
输出是
我在 bigquery 数据库中有分类日志数据,我想根据滑动 window 进行处理。我想对大小为 3 或 5 的 window 应用 MODE 操作,以便丢弃一次性事件或类别更改。
|SysDT | Power_State | Target |
| -------- | -------- | -------- |
|2021-07-01 09:03:57+00:00| EDC | EDC |
|2021-07-01 09:08:57+00:00| EDC | EDC |
|2021-07-01 09:13:57+00:00| DWN | EDC |
|2021-07-01 09:18:57+00:00| EDC | EDC |
|2021-07-01 09:23:58+00:00| EDC | EDC |
|2021-07-01 09:28:59+00:00| DWN | EDC |
|2021-07-01 09:33:59+00:00| EDC | EDC |
我尝试使用 OVER 运算符来提供所需的滑动 window 但接下来我需要一个自定义 MODE 运算符。修改此查询以避免此类 MODE 函数或在 bigquery 中编写自定义 MODE 函数有什么想法吗?
SELECT *, MODE(Power_State)
OVER(ORDER BY SysDT ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as Target
FROM Master_Data.2021_07
ORDER BY SysDT
非常感谢任何帮助。谢谢
考虑以下方法
select sysdt, power_state,
( select approx_top_count(state, 1)[offset(0)].value
from unnest(arr) state
) as target
from (
select *, array_agg(power_state) over win arr
from your_table
window win as (order by sysdt rows between 1 preceding and 1 following)
)
-- order by sysdt
如果应用于您问题中的示例数据
with your_table as (
select '2021-07-01 09:03:57+00:00' sysdt, 'EDC' power_state union all
select '2021-07-01 09:08:57+00:00', 'EDC' union all
select '2021-07-01 09:13:57+00:00', 'DWN' union all
select '2021-07-01 09:18:57+00:00', 'EDC' union all
select '2021-07-01 09:23:58+00:00', 'EDC' union all
select '2021-07-01 09:28:59+00:00', 'DWN' union all
select '2021-07-01 09:33:59+00:00', 'EDC'
)
输出是