如何从 Snowflake 中的键值对聚合值?
How do I aggregate values from key-value pairs in Snowflake?
我有一个 table,有一列包含以逗号分隔的“事件”ID 列表。一些事件具有关联的数量,由等号分隔。这是一个例子:
237=33.00,238=98.00,239,100,101, ...
所以我有以下事件:237、238、239、100、101
并且这些事件有一个相关的数量:237 (33.00), 238 (98.00)
我正在尝试计算事件的发生次数以及具有数量(平均值或总和)的事件。我还尝试包括 table 中的其他维度列,例如 day.
这似乎可行,但我想知道是否有更好的方法来处理半结构化数据的所有 Snowflake 函数。这也没有完全解决我的问题,因为我需要在其自己的列中的每个聚合值。
with cte as (
select
day,
EVENTS
from mytable
where not EVENTS is null
)
select
day,
avg(split_part(value, '=', 2)) as avgOfEvent237
from cte, lateral split_to_table(cte.EVENTS, ',')
where value like '237=%'
group by 1;
DAY
AVGOFEVENT237
2022-03-01
35.9
因此给定 split_to_tables 将丢弃没有拆分的行,您可以删除仅修剪 NULL 的 CTE
然后您可以将 IFF and STARTSWITH 嵌入到 AVG 中以获取每个值的值。
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
因此将此 CTE 与“假数据”一起使用
WITH mytable(day, events) as (
select * FROM VALUES
('2022-03-23'::date, '237=33.00,238=98.00,239,100,101'),
('2022-03-23'::date, '237=35.00,238=96.00,239,100,101')
)
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
我得到结果:
DAY
AVGOFEVENT237
AVGOFEVENT238
2022-03-23
34
97
你喜欢的版本比较小:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1
并且您可以将其拆分出来以实现您自己的 SUM/COUNT 和 AVG:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
,SUM(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as sumOfEvent237
,SUM(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as sumOfEvent238
,count(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as countOfEvent237
,count(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as countOfEvent238
,DIV0(sumOfEvent237, countOfEvent237) as byHandAvgOfEvent237
,DIV0(sumOfEvent238, countOfEvent238) as byHandAvgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1
我有一个 table,有一列包含以逗号分隔的“事件”ID 列表。一些事件具有关联的数量,由等号分隔。这是一个例子:
237=33.00,238=98.00,239,100,101, ...
所以我有以下事件:237、238、239、100、101
并且这些事件有一个相关的数量:237 (33.00), 238 (98.00)
我正在尝试计算事件的发生次数以及具有数量(平均值或总和)的事件。我还尝试包括 table 中的其他维度列,例如 day.
这似乎可行,但我想知道是否有更好的方法来处理半结构化数据的所有 Snowflake 函数。这也没有完全解决我的问题,因为我需要在其自己的列中的每个聚合值。
with cte as (
select
day,
EVENTS
from mytable
where not EVENTS is null
)
select
day,
avg(split_part(value, '=', 2)) as avgOfEvent237
from cte, lateral split_to_table(cte.EVENTS, ',')
where value like '237=%'
group by 1;
DAY | AVGOFEVENT237 |
---|---|
2022-03-01 | 35.9 |
因此给定 split_to_tables 将丢弃没有拆分的行,您可以删除仅修剪 NULL 的 CTE
然后您可以将 IFF and STARTSWITH 嵌入到 AVG 中以获取每个值的值。
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
因此将此 CTE 与“假数据”一起使用
WITH mytable(day, events) as (
select * FROM VALUES
('2022-03-23'::date, '237=33.00,238=98.00,239,100,101'),
('2022-03-23'::date, '237=35.00,238=96.00,239,100,101')
)
SELECT m.day
,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;
我得到结果:
DAY | AVGOFEVENT237 | AVGOFEVENT238 |
---|---|---|
2022-03-23 | 34 | 97 |
你喜欢的版本比较小:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1
并且您可以将其拆分出来以实现您自己的 SUM/COUNT 和 AVG:
SELECT m.day
,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
,SUM(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as sumOfEvent237
,SUM(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as sumOfEvent238
,count(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as countOfEvent237
,count(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as countOfEvent238
,DIV0(sumOfEvent237, countOfEvent237) as byHandAvgOfEvent237
,DIV0(sumOfEvent238, countOfEvent238) as byHandAvgOfEvent238
FROM mytable m
,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1