BigQuery:查找 ID 类型 RECORD 的数组并使用 SQL 从辅助 table 连接数据
BigQuery: Lookup array of ids type RECORD and join data from secondary table using SQL
我的数据结构如下:
产品
| name | region_ids |
----------------------------------
| shoe | c32, a43, x53 |
| hat | c32, f42 |
# Schema
name STRING NULLABLE
region_ids RECORD REPEATED
region_ids.value STRING NULLABLE
地区
| _id | name |
---------------------
| c32 | london |
| a43 | manchester |
| x53 | bristol |
| f42 | liverpool |
# Schema
_id STRING NULLABLE
name STRING NULLABLE
我想查找“region_ids”的数组并将它们替换为区域名称以生成如下所示的 table:
| _id | name | region_names |
----------------------------------------------
| d22 | shoe | london, manchester, bristol |
| t64 | hat | london, liverpool |
使用标准 SQL 执行此操作的最佳方法是什么?
谢谢,
一个
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT p._id, p.name,
STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name
您可以使用您问题中的样本数据进行测试,如以下示例所示
#standardSQL
WITH `project.dataset.Products` AS (
SELECT 'd22' _id, 'shoe' name, [STRUCT<value STRING>('c32'), STRUCT('a43'), STRUCT('x53')] region_ids UNION ALL
SELECT 't64', 'hat', [STRUCT<value STRING>('c32'), STRUCT('f42')]
), `project.dataset.Regions` AS (
SELECT 'c32' _id, 'london' name UNION ALL
SELECT 'a43', 'manchester' UNION ALL
SELECT 'x53', 'bristol' UNION ALL
SELECT 'f42', 'liverpool'
)
SELECT p._id, p.name,
STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name
结果是
Row _id name region_names
1 d22 shoe london, manchester, bristol
2 t64 hat london, liverpool
根据您问题中的输出示例 - 您希望 region_names
为带有逗号分隔名称列表的字符串
但是,如果您需要 region_names
作为数组 - 您可以将 STRING_AGG(r.name, ', ' ORDER BY OFFSET)
替换为 ARRAY_AGG(r.name ORDER BY OFFSET)
我的数据结构如下:
产品
| name | region_ids |
----------------------------------
| shoe | c32, a43, x53 |
| hat | c32, f42 |
# Schema
name STRING NULLABLE
region_ids RECORD REPEATED
region_ids.value STRING NULLABLE
地区
| _id | name |
---------------------
| c32 | london |
| a43 | manchester |
| x53 | bristol |
| f42 | liverpool |
# Schema
_id STRING NULLABLE
name STRING NULLABLE
我想查找“region_ids”的数组并将它们替换为区域名称以生成如下所示的 table:
| _id | name | region_names |
----------------------------------------------
| d22 | shoe | london, manchester, bristol |
| t64 | hat | london, liverpool |
使用标准 SQL 执行此操作的最佳方法是什么?
谢谢,
一个
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT p._id, p.name,
STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name
您可以使用您问题中的样本数据进行测试,如以下示例所示
#standardSQL
WITH `project.dataset.Products` AS (
SELECT 'd22' _id, 'shoe' name, [STRUCT<value STRING>('c32'), STRUCT('a43'), STRUCT('x53')] region_ids UNION ALL
SELECT 't64', 'hat', [STRUCT<value STRING>('c32'), STRUCT('f42')]
), `project.dataset.Regions` AS (
SELECT 'c32' _id, 'london' name UNION ALL
SELECT 'a43', 'manchester' UNION ALL
SELECT 'x53', 'bristol' UNION ALL
SELECT 'f42', 'liverpool'
)
SELECT p._id, p.name,
STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name
结果是
Row _id name region_names
1 d22 shoe london, manchester, bristol
2 t64 hat london, liverpool
根据您问题中的输出示例 - 您希望 region_names
为带有逗号分隔名称列表的字符串
但是,如果您需要 region_names
作为数组 - 您可以将 STRING_AGG(r.name, ', ' ORDER BY OFFSET)
替换为 ARRAY_AGG(r.name ORDER BY OFFSET)