将关系应用于标准 SQL 中的数组

Apply an relation to an array in standard SQL

我有两个大查询 table。 Table 1 具有架构 {id:String, colors:Array[String]} 并且看起来像

| id   | colors                      |
|------|-----------------------------|
| id_1 | ["blue", "green", "orange"] |
| id_2 | ["red" , "blue", "green" ]  |
| ...  | ....                        |

和 Table 2 将颜色与具有架构 {color:String, number:Int} 的数字相关联并且看起来像

| color | number |
|-------|--------|
| "blue"| 0      |
| "red" | 1      |
| ...   | ...    |

我想生成一个看起来像

的table
| id | numbers |
|----|---------|
|id_1| [0,3,4] |
|id_2| [1,0,3] |
| ...|...      |

通过将table1中的每种颜色映射到其对应的数字获得。我能想出的唯一解决办法是

SELECT id, ARRAY_AGG(number) AS numbers
FROM (table_1 CROSS JOIN UNNEST(table_1.colors) as color) JOIN table_2 USING(color) 
GROUP BY email

但这需要很长时间(可能是交叉连接的原因)

您也可以这样表述:

SELECT email,
       (SELECT ARRAY_AGG(number) AS numbers
        FROM UNNEST(table_1.colors) color JOIN 
             table_2
        USING (color) 
       ) as colors
FROM table_1;

我不确定每行的 "local" 聚合是否会比 BigQuery 中的 "overall" 聚合更好。但这值得一试。

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT id,
  ARRAY(
    SELECT number FROM table_1.colors color 
    JOIN `project.dataset.table_2` USING (color) 
  ) AS numbers
FROM `project.dataset.table_1` table_1   

您可以使用您问题中的示例数据来测试和使用上面的示例,如下例所示

#standardSQL
WITH `project.dataset.table_1` AS (
  SELECT 'id_1' id, ["blue", "green", "orange"] colors UNION ALL
  SELECT 'id_2', ["red" , "blue", "green" ] 
), `project.dataset.table_2` AS (
  SELECT 'blue' color, 0 number UNION ALL
  SELECT 'red', 1 UNION ALL
  SELECT 'green', 3 UNION ALL
  SELECT 'orange', 4
)
SELECT id,
  ARRAY(
    SELECT number FROM table_1.colors color 
    JOIN `project.dataset.table_2` USING (color) 
  ) AS numbers
FROM `project.dataset.table_1` table_1   

结果

像这样简单的东西:

select id, array_agg(number) as numbers from (
  select id, c, t2.number from table_1 t1, unnest(t1.colors) c
  join table_2 t2 on c = t2.color
)
group by 1