如何从 Snowflake 中的键值对聚合值?

How do I aggregate values from key-value pairs in Snowflake?

我有一个 table,有一列包含以逗号分隔的“事件”ID 列表。一些事件具有关联的数量,由等号分隔。这是一个例子:

237=33.00,238=98.00,239,100,101, ...

我正在尝试计算事件的发生次数以及具有数量(平均值或总和)的事件。我还尝试包括 table 中的其他维度列,例如 day.

这似乎可行,但我想知道是否有更好的方法来处理半结构化数据的所有 Snowflake 函数。这也没有完全解决我的问题,因为我需要在其自己的列中的每个聚合值。

with cte as (
  select
    day,
    EVENTS
  from mytable
  where not EVENTS is null
)
select
    day,
    avg(split_part(value, '=', 2)) as avgOfEvent237
from cte, lateral split_to_table(cte.EVENTS, ',')
where value like '237=%'
group by 1;
DAY AVGOFEVENT237
2022-03-01 35.9

因此给定 split_to_tables 将丢弃没有拆分的行,您可以删除仅修剪 NULL 的 CTE

然后您可以将 IFF and STARTSWITH 嵌入到 AVG 中以获取每个值的值。

SELECT m.day
    ,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;

因此将此 CTE 与“假数据”一起使用

WITH mytable(day, events) as (
    select * FROM VALUES
    ('2022-03-23'::date, '237=33.00,238=98.00,239,100,101'),
    ('2022-03-23'::date, '237=35.00,238=96.00,239,100,101')
)
SELECT m.day
    ,AVG(iff(startswith(v.value, '237='), split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(startswith(v.value, '238='), split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1;

我得到结果:

DAY AVGOFEVENT237 AVGOFEVENT238
2022-03-23 34 97

你喜欢的版本比较小:

SELECT m.day
    ,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1

并且您可以将其拆分出来以实现您自己的 SUM/COUNT 和 AVG:

SELECT m.day
    ,AVG(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as avgOfEvent237
    ,AVG(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as avgOfEvent238
    ,SUM(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as sumOfEvent237
    ,SUM(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as sumOfEvent238
    ,count(iff(v.value like '237=%', split_part(value, '=', 2)::float, null)) as countOfEvent237
    ,count(iff(v.value like '238=%', split_part(value, '=', 2)::float, null)) as countOfEvent238
    ,DIV0(sumOfEvent237, countOfEvent237) as byHandAvgOfEvent237
    ,DIV0(sumOfEvent238, countOfEvent238) as byHandAvgOfEvent238
FROM mytable m
    ,lateral split_to_table(m.EVENTS, ',') v
GROUP BY 1
ORDER BY 1