Athena/Presto 在 JSON 对象中找到具有最大值的键
Athena/Presto find key with the max value in JSON object
我在 Athena 中有一个列(字符串类型)json,如下所示:
{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}
我如何编写一个查询,该查询将 return 为每一行和关联值 ( 3.3
).
注:事先不知道键名是什么(也可以有好几个)
所以我找到了一个方法,但看起来很复杂,如果有人有更好的解决方案,我将不胜感激。假设有一个名为 Id 的列,并且 json 存储在单独的列中:
with d as (
select id,
CAST(json_extract(json_col, '$') AS MAP(VARCHAR, VARCHAR)) as s
from TABLE_NAME
),
d2 as (
select *,
element_at(s, key) AS value
from d
cross join unnest(map_keys(s)) AS sx(key)
),
d3 as (
select id, key, value,
rank() over (partition by id order by value desc) as order
from d2
order by id, order
)
select id, key, value from d3 where order = 1
基本上首先将 JSON 对象转换为映射,然后取消嵌套映射键和交叉连接并在单独的列中存储值,然后计算按值分区的排名,然后只选择那些行等级 = 1
您可以将 json 转换为 MAP(VARCHAR, INTEGER)
并进行处理。例如(这使用 map_entries
function to turn map into array of rows, reduce
数组函数并依赖于默认的行命名约定):
WITH dataset AS (
SELECT *
FROM (VALUES
(JSON '{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}'),
(JSON '{
"key0": 1.1,
"key1":4.4,
"key2": 3.3
}')) AS t (json))
SELECT row.field0 as key, row.field1 as value
FROM
(SELECT reduce(
map_entries(CAST(json as MAP(VARCHAR, INTEGER))),
ROW (null, null),
(agg, curr) -> IF (agg.field1 > curr.field1, agg, curr),
s -> s) as row
FROM dataset)
输出:
key
value
key3
3
key1
4
我在 Athena 中有一个列(字符串类型)json,如下所示:
{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}
我如何编写一个查询,该查询将 return 为每一行和关联值 ( 3.3
).
注:事先不知道键名是什么(也可以有好几个)
所以我找到了一个方法,但看起来很复杂,如果有人有更好的解决方案,我将不胜感激。假设有一个名为 Id 的列,并且 json 存储在单独的列中:
with d as (
select id,
CAST(json_extract(json_col, '$') AS MAP(VARCHAR, VARCHAR)) as s
from TABLE_NAME
),
d2 as (
select *,
element_at(s, key) AS value
from d
cross join unnest(map_keys(s)) AS sx(key)
),
d3 as (
select id, key, value,
rank() over (partition by id order by value desc) as order
from d2
order by id, order
)
select id, key, value from d3 where order = 1
基本上首先将 JSON 对象转换为映射,然后取消嵌套映射键和交叉连接并在单独的列中存储值,然后计算按值分区的排名,然后只选择那些行等级 = 1
您可以将 json 转换为 MAP(VARCHAR, INTEGER)
并进行处理。例如(这使用 map_entries
function to turn map into array of rows, reduce
数组函数并依赖于默认的行命名约定):
WITH dataset AS (
SELECT *
FROM (VALUES
(JSON '{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}'),
(JSON '{
"key0": 1.1,
"key1":4.4,
"key2": 3.3
}')) AS t (json))
SELECT row.field0 as key, row.field1 as value
FROM
(SELECT reduce(
map_entries(CAST(json as MAP(VARCHAR, INTEGER))),
ROW (null, null),
(agg, curr) -> IF (agg.field1 > curr.field1, agg, curr),
s -> s) as row
FROM dataset)
输出:
key | value |
---|---|
key3 | 3 |
key1 | 4 |