使用 OpenX SerDe 在 Athena 中聚合 JSON 个对象自身的键值属性

Aggregate JSON object's own key value attributes in Athena using OpenX SerDe

我有一个 JSON 结构,看起来类似于以下两个示例事件:

事件 1

    {
      "event":{
             "type" : "FooBarEvent"
             "kv":{
                "key1":"value1",
                "key2":"value2",
                "3":"three",
                "d":"4"
             }
      }
    }

活动 2

    {
      "event":{
             "type" : "FooBarEvent"
             "kv":{
                "key1":"value1",
                "key2":"value2000",
                "e": "4"
             }
      }
    }

请注意,我事先不知道要输入哪些键和值,我想聚合(计数)它们。这两个事件的输出如下所示:

+-----------+------+-----------+--------+
| EventType | Key  | Value     | Amount |
+-----------+------+-----------+--------+
| Foobar    | key1 | value1    | 2      |
+-----------+------+-----------+--------+
| Foobar    | key2 | value1    | 1      |
+-----------+------+-----------+--------+
| Foobar    | key2 | value2000 | 1      |
+-----------+------+-----------+--------+
| Foobar    | 3    | three     | 1      |
+-----------+------+-----------+--------+
| Foobar    | d    | 4         | 1      |
+-----------+------+-----------+--------+
| Foobar    | e    | 4         | 1      |
+-----------+------+-----------+--------+

有没有办法在不改变 JSON 结构的情况下在 Athena 中完成此操作?如何映射和 flatten/query 结构最好?

您好,它应该可以使用 UNNEST 功能并将 kv 转换为地图。假设您的数据存储在名为 json_data

的 table 中,以下查询应该有效
with data_formated as
(
    select *
    ,json_extract_scalar(json_field,'$.event.type') event_type
    ,cast(json_extract(json_field,'$.event.kv') as map(varchar,varchar)) key_value
    from json_data
)
,unnesting_data as
(
    select *
    from data_formated
    cross join unnest(key_value) as t (k,v)
)
select event_type,k,v,count(1) amount
from unnesting_data
group by 1,2,3
order by 1,2,3;