使用 SQL presto 将嵌套的类似 dict 的 varchar 列拆分为多个列
Splitting a nested dict-like varchar column into multiple columns using SQL presto
在我的 table 中,我有一列,它是一个 varchar,但具有类似字典的嵌套格式(三个嵌套级别)。有些条目有多个键值对(客户 ID 和姓名),而有些条目只有一个条目(客户 ID)。例如:
column
{
"customer_type1": {
"location1": {"customerid":"12345","name":"John"},
"location2": {"customerid":"12346","name":"Conor"},
}
"customer_type2": {
"location3": {"customerid":"12347","name":"Brian"},
"location4": {"customerid":"12348"},
}
}
我需要一个查询,将列分解为 table,如下所示:
customer_type Location Customer_ID Name
customer_type1 location1 12345 John
customer_type1 location2 12346 Conor
customer_type2 location3 12347 Brian
customer_type2 location4 12348
我知道一个用于提取单个嵌套键值对的解决方案,但无法编辑它以适用于像这样的嵌套字典。我正在使用 Prestosql。
--query for single nested key-value pair
select json_extract_scalar(json_parse(column), '$.customer_id') customerid,
json_extract_scalar(json_parse(column), '$.name') name
from dataset
这种动态 json 的常用方法是将其转换为 MAP
, in this case nested map - MAP(VARCHAR, MAP(VARCHAR, JSON))
and use unnest
以使结果变平:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{
"customer_type1": {
"location1": {"customerid":"12345","name":"John"},
"location2": {"customerid":"12346","name":"Conor"}
},
"customer_type2": {
"location3": {"customerid":"12347","name":"Brian"},
"location4": {"customerid":"12348"}
}
}'
)
)
-- query
select customer_type,
location,
json_extract_scalar(cust_json, '$.customerid') customer_id,
json_extract_scalar(cust_json, '$.name') name
from (
select cast(json_parse(json_str) as map(varchar, map(varchar, json))) as maps
from dataset
)
cross join unnest(maps) as t(customer_type, location_map)
cross join unnest(location_map) as t(location, cust_json)
输出:
customer_type
location
customer_id
name
customer_type1
location1
12345
John
customer_type1
location2
12346
Conor
customer_type2
location3
12347
Brian
customer_type2
location4
12348
在我的 table 中,我有一列,它是一个 varchar,但具有类似字典的嵌套格式(三个嵌套级别)。有些条目有多个键值对(客户 ID 和姓名),而有些条目只有一个条目(客户 ID)。例如:
column
{
"customer_type1": {
"location1": {"customerid":"12345","name":"John"},
"location2": {"customerid":"12346","name":"Conor"},
}
"customer_type2": {
"location3": {"customerid":"12347","name":"Brian"},
"location4": {"customerid":"12348"},
}
}
我需要一个查询,将列分解为 table,如下所示:
customer_type Location Customer_ID Name
customer_type1 location1 12345 John
customer_type1 location2 12346 Conor
customer_type2 location3 12347 Brian
customer_type2 location4 12348
我知道一个用于提取单个嵌套键值对的解决方案,但无法编辑它以适用于像这样的嵌套字典。我正在使用 Prestosql。
--query for single nested key-value pair
select json_extract_scalar(json_parse(column), '$.customer_id') customerid,
json_extract_scalar(json_parse(column), '$.name') name
from dataset
这种动态 json 的常用方法是将其转换为 MAP
, in this case nested map - MAP(VARCHAR, MAP(VARCHAR, JSON))
and use unnest
以使结果变平:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{
"customer_type1": {
"location1": {"customerid":"12345","name":"John"},
"location2": {"customerid":"12346","name":"Conor"}
},
"customer_type2": {
"location3": {"customerid":"12347","name":"Brian"},
"location4": {"customerid":"12348"}
}
}'
)
)
-- query
select customer_type,
location,
json_extract_scalar(cust_json, '$.customerid') customer_id,
json_extract_scalar(cust_json, '$.name') name
from (
select cast(json_parse(json_str) as map(varchar, map(varchar, json))) as maps
from dataset
)
cross join unnest(maps) as t(customer_type, location_map)
cross join unnest(location_map) as t(location, cust_json)
输出:
customer_type | location | customer_id | name |
---|---|---|---|
customer_type1 | location1 | 12345 | John |
customer_type1 | location2 | 12346 | Conor |
customer_type2 | location3 | 12347 | Brian |
customer_type2 | location4 | 12348 |