扁平化雪花中变体 json 列中的多个名称数组
Flatten multiple names arrays within variant json column in snowflake
我有一个网络抓取工具将数据转储到 Snowflake 数据库的变体列中。
这是抓取页面数据,然后为页面中找到的各种 table 创建 json 数组。
这里有一个 json 类型的例子,我会用足球做类比:
{
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"table_2": [
{
"position": "1",
"team_name": "Bayern Munich",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Bayer Leverkussen",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"referrer_url": "https://www.soccertables.com",
}
理想情况下,我希望它的输出是一个平面的、关系型的 table:
table_name 位置 team_name games_played 等...
table_1 1 利物浦 29 ...
table_1 2 人。城市 29 ...
table_2 1 拜仁慕尼黑 29 ...
....
我知道,如果我只对 table_1 感兴趣,我可以这样做:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v
而且我可以对 table_2 做同样的事情并将它们合并,但是关于 table_N 占位符可能有 N 种可能性。
我看过多次 LATERAL FLATTEN:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v, LATERAL FLATTEN(JSON_DATA:table_2) v2
但这会导致数据重复,并且不允许我将每个 tables 列都放在一个关系结构中。
我确定我在这里遗漏了一些简单的东西,但我已经到了一个地步,我认为我已经盯着它看太久了,就是看不到它。
提前致谢,
S
如果您正在尝试创建 table_n 数据的单个平面视图,以及第一层的属性,那么类似的方法就可以了。
WITH x AS (
SELECT '{
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35",
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45",
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "...",
"points": "..."
}
],
"table_2": [
{
"position": "1",
"team_name": "Bayern Munich",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35",
"points": "80"
},
{
"position": "2",
"team_name": "Bayer Leverkussen",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45",
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "...",
"points": "..."
}
],
"referrer_url": "https://www.soccertables.com",
}' as var)
SELECT
parse_json(x.var):dom_url::string,
parse_json(x.var):event_id::string,
parse_json(x.var):event_utc_time::string,
parse_json(x.var):ip_address::string,
x3.value:games_drawn::string,
x3.value:games_lost::string,
x3.value:games_played::string,
x3.value:games_won::string,
x3.value:goals_against::string,
x3.value:goals_for::string,
x3.value:points::string,
x3.value:position::string,
x3.value:team_name::string
FROM x
,LATERAL FLATTEN(parse_json(x.var)) x2
,LATERAL FLATTEN(X2.VALUE) x3;
CTE 显然只是用您提供的示例 JSON 来展示示例。如果您关心哪些记录来自哪个 table,您还可以将 x2.key
作为一个元素包含在 SELECT
.
中
我有一个网络抓取工具将数据转储到 Snowflake 数据库的变体列中。 这是抓取页面数据,然后为页面中找到的各种 table 创建 json 数组。
这里有一个 json 类型的例子,我会用足球做类比:
{
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"table_2": [
{
"position": "1",
"team_name": "Bayern Munich",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Bayer Leverkussen",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"referrer_url": "https://www.soccertables.com",
}
理想情况下,我希望它的输出是一个平面的、关系型的 table:
table_name 位置 team_name games_played 等... table_1 1 利物浦 29 ... table_1 2 人。城市 29 ... table_2 1 拜仁慕尼黑 29 ... ....
我知道,如果我只对 table_1 感兴趣,我可以这样做:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v
而且我可以对 table_2 做同样的事情并将它们合并,但是关于 table_N 占位符可能有 N 种可能性。
我看过多次 LATERAL FLATTEN:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v, LATERAL FLATTEN(JSON_DATA:table_2) v2
但这会导致数据重复,并且不允许我将每个 tables 列都放在一个关系结构中。
我确定我在这里遗漏了一些简单的东西,但我已经到了一个地步,我认为我已经盯着它看太久了,就是看不到它。
提前致谢, S
如果您正在尝试创建 table_n 数据的单个平面视图,以及第一层的属性,那么类似的方法就可以了。
WITH x AS (
SELECT '{
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35",
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45",
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "...",
"points": "..."
}
],
"table_2": [
{
"position": "1",
"team_name": "Bayern Munich",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35",
"points": "80"
},
{
"position": "2",
"team_name": "Bayer Leverkussen",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45",
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "...",
"points": "..."
}
],
"referrer_url": "https://www.soccertables.com",
}' as var)
SELECT
parse_json(x.var):dom_url::string,
parse_json(x.var):event_id::string,
parse_json(x.var):event_utc_time::string,
parse_json(x.var):ip_address::string,
x3.value:games_drawn::string,
x3.value:games_lost::string,
x3.value:games_played::string,
x3.value:games_won::string,
x3.value:goals_against::string,
x3.value:goals_for::string,
x3.value:points::string,
x3.value:position::string,
x3.value:team_name::string
FROM x
,LATERAL FLATTEN(parse_json(x.var)) x2
,LATERAL FLATTEN(X2.VALUE) x3;
CTE 显然只是用您提供的示例 JSON 来展示示例。如果您关心哪些记录来自哪个 table,您还可以将 x2.key
作为一个元素包含在 SELECT
.