Athena/Presto 在 JSON 对象中找到具有最大值的键

Athena/Presto find key with the max value in JSON object

我在 Athena 中有一个列(字符串类型)json,如下所示:

{
    "key1": 1.1,
    "key2":2.2,
    "key3": 3.3
}

我如何编写一个查询,该查询将 return 为每一行和关联值 ( 3.3).

注:事先不知道键名是什么(也可以有好几个)

所以我找到了一个方法,但看起来很复杂,如果有人有更好的解决方案,我将不胜感激。假设有一个名为 Id 的列,并且 json 存储在单独的列中:

with d as (
    select id,
    CAST(json_extract(json_col, '$') AS MAP(VARCHAR, VARCHAR)) as s
    from TABLE_NAME
),

d2 as (
    select *,
    element_at(s, key) AS value
    from d
    cross join unnest(map_keys(s)) AS sx(key)
),

d3 as (
    select id, key, value,
    rank() over (partition by id order by value desc) as order
    from d2
    order by id, order
)


select id, key, value from d3 where order = 1

基本上首先将 JSON 对象转换为映射,然后取消嵌套映射键和交叉连接并在单独的列中存储值,然后计算按值分区的排名,然后只选择那些行等级 = 1

您可以将 json 转换为 MAP(VARCHAR, INTEGER) 并进行处理。例如(这使用 map_entries function to turn map into array of rows, reduce 数组函数并依赖于默认的行命名约定):

WITH dataset AS (
    SELECT *
    FROM (VALUES
        (JSON '{
            "key1": 1.1,
            "key2":2.2,
            "key3": 3.3
        }'),
        (JSON '{
            "key0": 1.1,
            "key1":4.4,
            "key2": 3.3
        }')) AS t (json))

SELECT row.field0 as key, row.field1 as value
FROM
    (SELECT reduce(
      map_entries(CAST(json as MAP(VARCHAR, INTEGER))),
      ROW (null, null),
      (agg, curr) -> IF (agg.field1 > curr.field1, agg, curr),
      s -> s) as row
    FROM dataset)

输出:

key value
key3 3
key1 4