配置单元字符串 json 列表到具有特定字段的数组

hive string json list to array with specific field

我想 select 数组在字符串 json 列表中,具有配置单元中的特定字段。

例如,

[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]

return key1 值数组

[val1,val3,val5]

我怎样才能让它成为可能?

将字符串转换为 JSON 数组:删除 [],在 } 和 { 之间用逗号分隔。然后提取val1和collect_list得到一个val1的数组,看代码中的注释:

with mytable as(--data example with single row
 select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string 
)

select collect_list(                             --collect array
                    get_json_object(json_map_string,'$.key1') --key1 extracted
                   ) as key1_array   
from
(
select  split(regexp_replace(json_string,'^\[|\]$',''), --remove []
                          '(?<=\}),(?=\{)'                          --split by comma only after } and before {
                         ) as json_array                              --converted to array of json strings (map)
  from mytable
)s
 lateral view outer explode(json_array) e as json_map_string --explode array elements
;

结果:

key1_array
["val1","val3","val5"]