在 BigQuery 中展平 json 字符串
Flatten json string in BigQuery
我有一个无法规范化数据的自定义 Airbyte 作业,因此我需要手动进行。以下数据来自我们的人力资源系统:
{
"title": "My Report",
"fields": [{
"id": "employeeNumber",
"name": "Employee #"
},
{
"id": "firstName"
"name": "First Name"
},
{
"id": "lastName"
"name": "Last Name"
}],
"employees": [{
"employeeNumber": "1234",
"firstName": "Ann",
"lastName": "Perkins"
},
{
"employeeNumber": "5678",
"firstName": "Bob",
"lastName": "Builder"
}]
}
我当前的 bigquery table 看起来像这样(json 存储为字符串):
_airbyte_ab_id
_airbyte_emitted_at
_airbyte_data
123abc
2022-01-30 19:41:59 UTC
{"title": "My Datawareouse", "fields": [ {"id": "employeeNumber", "name": "Employee_Number"}, {"id": "firstName", "name": "First_Name" }, { "id": "lastName", "name": "Last_Name"} ], "employees": [ { "employeeNumber": "1234", "firstName": "Ann", "lastName": "Perkins" }, { "employeeNumber": "5678", "firstName": "Bob", "lastName": "Builder" } ] }
我正在尝试将 table 规范化为如下所示:
_airbyte_ab_id
_airbyte_emitted_at
Employee_Number
First_Name
Last_Name
123abc
2022-01-30 19:41:59 UTC
1234
Ann
Perkins
123abc
2022-01-30 19:41:59 UTC
5678
Bob
Builder
如何在bigquery 中使用SQL 将json 展平为列,如上例? (该脚本将从 dbt 运行ning,但现在,我只是试图获得对 运行 的有效查询)
我应该补充一点,实际的 json 有更多的字段,它们可能会发生变化,我希望“中间名”等内容为空值。因此,在一个完美的世界中,我不必定义每个列名,而是通过读取“Fields”数组动态地运行。
How to flatten the json into columns as the example above, using SQL in bigquery?
考虑以下方法
select _airbyte_ab_id, _airbyte_emitted_at,
json_value(employee, '$.employeeNumber') employeeNumber,
json_value(employee, '$.firstName') firstName,
json_value(employee, '$.lastName') lastName
from your_table,
unnest(json_extract_array(_airbyte_data, '$.employees')) employee
如果应用于您问题中的示例数据 - 输出为
... in a perfect world, I would not have to define each column name, but have it run dynamically by reading the "Fields" array
如果您的字段是动态定义的,并且行与行之间甚至可能不同 - 我建议考虑以下扁平化方法
select _airbyte_ab_id, _airbyte_emitted_at,
md5(employee) employee_hash,
json_value(field, "$.id") key,
regexp_extract(employee, r'"' || json_value(field, "$.id") || '":"(.*?)"') value
from your_table,
unnest(json_extract_array(_airbyte_data, '$.employees')) employee,
unnest(json_extract_array(_airbyte_data, '$.fields')) field
如果应用于您问题中的示例数据 - 输出为
我有一个无法规范化数据的自定义 Airbyte 作业,因此我需要手动进行。以下数据来自我们的人力资源系统:
{
"title": "My Report",
"fields": [{
"id": "employeeNumber",
"name": "Employee #"
},
{
"id": "firstName"
"name": "First Name"
},
{
"id": "lastName"
"name": "Last Name"
}],
"employees": [{
"employeeNumber": "1234",
"firstName": "Ann",
"lastName": "Perkins"
},
{
"employeeNumber": "5678",
"firstName": "Bob",
"lastName": "Builder"
}]
}
我当前的 bigquery table 看起来像这样(json 存储为字符串):
_airbyte_ab_id | _airbyte_emitted_at | _airbyte_data |
---|---|---|
123abc | 2022-01-30 19:41:59 UTC | {"title": "My Datawareouse", "fields": [ {"id": "employeeNumber", "name": "Employee_Number"}, {"id": "firstName", "name": "First_Name" }, { "id": "lastName", "name": "Last_Name"} ], "employees": [ { "employeeNumber": "1234", "firstName": "Ann", "lastName": "Perkins" }, { "employeeNumber": "5678", "firstName": "Bob", "lastName": "Builder" } ] } |
我正在尝试将 table 规范化为如下所示:
_airbyte_ab_id | _airbyte_emitted_at | Employee_Number | First_Name | Last_Name |
---|---|---|---|---|
123abc | 2022-01-30 19:41:59 UTC | 1234 | Ann | Perkins |
123abc | 2022-01-30 19:41:59 UTC | 5678 | Bob | Builder |
如何在bigquery 中使用SQL 将json 展平为列,如上例? (该脚本将从 dbt 运行ning,但现在,我只是试图获得对 运行 的有效查询)
我应该补充一点,实际的 json 有更多的字段,它们可能会发生变化,我希望“中间名”等内容为空值。因此,在一个完美的世界中,我不必定义每个列名,而是通过读取“Fields”数组动态地运行。
How to flatten the json into columns as the example above, using SQL in bigquery?
考虑以下方法
select _airbyte_ab_id, _airbyte_emitted_at,
json_value(employee, '$.employeeNumber') employeeNumber,
json_value(employee, '$.firstName') firstName,
json_value(employee, '$.lastName') lastName
from your_table,
unnest(json_extract_array(_airbyte_data, '$.employees')) employee
如果应用于您问题中的示例数据 - 输出为
... in a perfect world, I would not have to define each column name, but have it run dynamically by reading the "Fields" array
如果您的字段是动态定义的,并且行与行之间甚至可能不同 - 我建议考虑以下扁平化方法
select _airbyte_ab_id, _airbyte_emitted_at,
md5(employee) employee_hash,
json_value(field, "$.id") key,
regexp_extract(employee, r'"' || json_value(field, "$.id") || '":"(.*?)"') value
from your_table,
unnest(json_extract_array(_airbyte_data, '$.employees')) employee,
unnest(json_extract_array(_airbyte_data, '$.fields')) field
如果应用于您问题中的示例数据 - 输出为