使用 SQL presto 将嵌套的类似 dict 的 varchar 列拆分为多个列

Splitting a nested dict-like varchar column into multiple columns using SQL presto

在我的 table 中,我有一列,它是一个 varchar,但具有类似字典的嵌套格式(三个嵌套级别)。有些条目有多个键值对(客户 ID 和姓名),而有些条目只有一个条目(客户 ID)。例如:

column
{
 "customer_type1": {
                    "location1": {"customerid":"12345","name":"John"}, 
                    "location2": {"customerid":"12346","name":"Conor"}, 
                   }
 "customer_type2": {
                    "location3": {"customerid":"12347","name":"Brian"}, 
                    "location4": {"customerid":"12348"}, 
                   }
 }

我需要一个查询,将列分解为 table,如下所示:

customer_type   Location   Customer_ID  Name
customer_type1  location1  12345        John
customer_type1  location2  12346        Conor
customer_type2  location3  12347        Brian
customer_type2  location4  12348        

我知道一个用于提取单个嵌套键值对的解决方案,但无法编辑它以适用于像这样的嵌套字典。我正在使用 Prestosql。

--query for single nested key-value pair
select json_extract_scalar(json_parse(column), '$.customer_id') customerid,
    json_extract_scalar(json_parse(column), '$.name') name
from dataset

这种动态 json 的常用方法是将其转换为 MAP, in this case nested map - MAP(VARCHAR, MAP(VARCHAR, JSON)) and use unnest 以使结果变平:

-- sample data
WITH dataset (json_str) AS (
    VALUES (
            '{
 "customer_type1": {
                    "location1": {"customerid":"12345","name":"John"}, 
                    "location2": {"customerid":"12346","name":"Conor"}
                   },
 "customer_type2": {
                    "location3": {"customerid":"12347","name":"Brian"}, 
                    "location4": {"customerid":"12348"} 
                   }
 }'
        )
) 

-- query
select customer_type, 
    location, 
    json_extract_scalar(cust_json, '$.customerid') customer_id, 
    json_extract_scalar(cust_json, '$.name') name
from (
        select cast(json_parse(json_str) as map(varchar, map(varchar, json))) as maps
        from dataset
    )
cross join unnest(maps) as t(customer_type, location_map)
cross join unnest(location_map) as t(location, cust_json)

输出:

customer_type location customer_id name
customer_type1 location1 12345 John
customer_type1 location2 12346 Conor
customer_type2 location3 12347 Brian
customer_type2 location4 12348