将关系应用于标准 SQL 中的数组
Apply an relation to an array in standard SQL
我有两个大查询 table。 Table 1 具有架构 {id:String, colors:Array[String]}
并且看起来像
| id | colors |
|------|-----------------------------|
| id_1 | ["blue", "green", "orange"] |
| id_2 | ["red" , "blue", "green" ] |
| ... | .... |
和 Table 2 将颜色与具有架构 {color:String, number:Int}
的数字相关联并且看起来像
| color | number |
|-------|--------|
| "blue"| 0 |
| "red" | 1 |
| ... | ... |
我想生成一个看起来像
的table
| id | numbers |
|----|---------|
|id_1| [0,3,4] |
|id_2| [1,0,3] |
| ...|... |
通过将table1中的每种颜色映射到其对应的数字获得。我能想出的唯一解决办法是
SELECT id, ARRAY_AGG(number) AS numbers
FROM (table_1 CROSS JOIN UNNEST(table_1.colors) as color) JOIN table_2 USING(color)
GROUP BY email
但这需要很长时间(可能是交叉连接的原因)
您也可以这样表述:
SELECT email,
(SELECT ARRAY_AGG(number) AS numbers
FROM UNNEST(table_1.colors) color JOIN
table_2
USING (color)
) as colors
FROM table_1;
我不确定每行的 "local" 聚合是否会比 BigQuery 中的 "overall" 聚合更好。但这值得一试。
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT id,
ARRAY(
SELECT number FROM table_1.colors color
JOIN `project.dataset.table_2` USING (color)
) AS numbers
FROM `project.dataset.table_1` table_1
您可以使用您问题中的示例数据来测试和使用上面的示例,如下例所示
#standardSQL
WITH `project.dataset.table_1` AS (
SELECT 'id_1' id, ["blue", "green", "orange"] colors UNION ALL
SELECT 'id_2', ["red" , "blue", "green" ]
), `project.dataset.table_2` AS (
SELECT 'blue' color, 0 number UNION ALL
SELECT 'red', 1 UNION ALL
SELECT 'green', 3 UNION ALL
SELECT 'orange', 4
)
SELECT id,
ARRAY(
SELECT number FROM table_1.colors color
JOIN `project.dataset.table_2` USING (color)
) AS numbers
FROM `project.dataset.table_1` table_1
结果
像这样简单的东西:
select id, array_agg(number) as numbers from (
select id, c, t2.number from table_1 t1, unnest(t1.colors) c
join table_2 t2 on c = t2.color
)
group by 1
我有两个大查询 table。 Table 1 具有架构 {id:String, colors:Array[String]}
并且看起来像
| id | colors |
|------|-----------------------------|
| id_1 | ["blue", "green", "orange"] |
| id_2 | ["red" , "blue", "green" ] |
| ... | .... |
和 Table 2 将颜色与具有架构 {color:String, number:Int}
的数字相关联并且看起来像
| color | number |
|-------|--------|
| "blue"| 0 |
| "red" | 1 |
| ... | ... |
我想生成一个看起来像
的table| id | numbers |
|----|---------|
|id_1| [0,3,4] |
|id_2| [1,0,3] |
| ...|... |
通过将table1中的每种颜色映射到其对应的数字获得。我能想出的唯一解决办法是
SELECT id, ARRAY_AGG(number) AS numbers
FROM (table_1 CROSS JOIN UNNEST(table_1.colors) as color) JOIN table_2 USING(color)
GROUP BY email
但这需要很长时间(可能是交叉连接的原因)
您也可以这样表述:
SELECT email,
(SELECT ARRAY_AGG(number) AS numbers
FROM UNNEST(table_1.colors) color JOIN
table_2
USING (color)
) as colors
FROM table_1;
我不确定每行的 "local" 聚合是否会比 BigQuery 中的 "overall" 聚合更好。但这值得一试。
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT id,
ARRAY(
SELECT number FROM table_1.colors color
JOIN `project.dataset.table_2` USING (color)
) AS numbers
FROM `project.dataset.table_1` table_1
您可以使用您问题中的示例数据来测试和使用上面的示例,如下例所示
#standardSQL
WITH `project.dataset.table_1` AS (
SELECT 'id_1' id, ["blue", "green", "orange"] colors UNION ALL
SELECT 'id_2', ["red" , "blue", "green" ]
), `project.dataset.table_2` AS (
SELECT 'blue' color, 0 number UNION ALL
SELECT 'red', 1 UNION ALL
SELECT 'green', 3 UNION ALL
SELECT 'orange', 4
)
SELECT id,
ARRAY(
SELECT number FROM table_1.colors color
JOIN `project.dataset.table_2` USING (color)
) AS numbers
FROM `project.dataset.table_1` table_1
结果
像这样简单的东西:
select id, array_agg(number) as numbers from (
select id, c, t2.number from table_1 t1, unnest(t1.colors) c
join table_2 t2 on c = t2.color
)
group by 1