取消嵌套地图值作为 Athena / presto 中的单独列

Unnesting map values as individual columns in Athena / presto

我的问题与此()有些相似。但就我而言,我事先知道我需要哪些专栏。

我的用例是这样的

我有一个 json blob,其中包含以下结构

{
  "reqId" : "1234",
  "clientId" : "client",
  "response" : [
                 {
                   "name" : "Susan",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 },
                 {
                   "name" : "Adams",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 }
               ]
}

我需要创建一个视图,它将return输出类似这样的内容

    name  |  project    |  Completed |
----------+-------------+------------+
    Susan |  project1   |   true     |
    Susan |  project2   |   false    |
    Adams |  project1   |   true     |
    Adams |  project2   |   false    |

我尝试了以下方法和其他方法。这是我能得到的最接近的

WITH dataset AS (
  SELECT 'Susan' as name, transform(filter(CAST(json_extract('{
           "projects": [{"name":"project1", "completed":false}, {"name":"project3", "completed":false},
           {"name":"project2", "completed":true}]}', '$.projects') AS ARRAY<MAP<VARCHAR, VARCHAR>>), p -> (p['name'] != 'project1')), p -> ROW(map_values(p))) AS projects
)
SELECT * from dataset
CROSS JOIN UNNEST(projects)

这是我得到的输出


    name    projects                                                        _col2
1   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project3, false]}
2   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project2, true]}

我基本上想取消嵌套我的地图的键值对作为单独的列。我如何在 presto / Athena 中执行此操作?

您的 JSON 示例似乎无效,它在 "name" : "Susan""name" : "Adams" 之后缺少一个 ,。除此之外,您可以通过此查询获得预期的输出,您需要 UNNEST 两次并且还需要一些转换:

with dataset as
(
    select json_parse('{"reqId" : "1234","clientId" : "client","response" : [{"name" : "Susan","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]},{"name" : "Adams","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]}]}') as json_col
)
,unnest_response as
(
    select * 
    from dataset
    cross join UNNEST(cast(json_extract(json_col, '$.response') as array<JSON>)) as t (response)
)
select 
json_extract_scalar(response, '$.name') name,
json_extract_scalar(project, '$.name') project_name,
json_extract_scalar(project, '$.completed') project_completed
from unnest_response
cross join UNNEST(cast(json_extract(response, '$.projects') as array<JSON>)) as t (project);