Bigquery 根据 id 和 id_type 聚合到数组中

Bigquery aggregating into array based on id and id_type

我有一个 table 看起来类似于:

WITH
  table AS (
  SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL
  SELECT 1, 23, 1 UNION ALL
  SELECT 1, 24, 1 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 2, 34, 1 UNION ALL
  SELECT 2, 46, 1 UNION ALL
  SELECT 2, 465, 2 UNION ALL
  SELECT 2, 349, 2 UNION ALL
  SELECT 2, 4, 0 UNION ALL
  SELECT 2, 3, 0 )
SELECT
  object_id,
  type_id,
  type_level
FROM
  table

现在我正在尝试为每个对象创建三个新列 type_level_0_arraytype_level_1_arraytype_level_2_array 并将相应级别类型的 type_id 聚合到这些数组中(我不是在寻找以逗号分隔的字符串)。

所以我的结果 table 应该如下所示:

+----+--------------------+--------------------+--------------------+
| id | type_level_0_array | type_level_1_array | type_level_2_array |
+----+--------------------+--------------------+--------------------+
| 1  | 2                  | 24,23              | 234                |
+----+--------------------+--------------------+--------------------+
| 2  | 3,4                | 34,46              | 465,349            |
+----+--------------------+--------------------+--------------------+

有什么办法可以做到吗?

更新:

虽然我的 type_id 似乎有一定的模式,例如0 级类型的长度为 1,1 级类型的长度为 2,依此类推,在我的真实数据集中没有这样的模式。只能通过查看任何行的 type_level 来识别级别。

试试这个。适合我。

Bigquery 不允许您创建其中包含 Null 的数组,这就是为什么需要 IGNORE NULLS 的原因。

编辑:我已将代码更新为基于 type_level 列

WITH table
 AS (
  SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL
  SELECT 1, 23, 1 UNION ALL
  SELECT 1, 24, 1 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 2, 34, 1 UNION ALL
  SELECT 2, 46, 1 UNION ALL
  SELECT 2, 465, 2 UNION ALL
  SELECT 2, 349, 2 UNION ALL
  SELECT 2, 4, 0 UNION ALL
  SELECT 2, 3, 0 )
SELECT
  ARRAY_AGG(CASE WHEN type_level = 0 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_0_array
  , ARRAY_AGG(CASE WHEN type_level = 1 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_1_array
  , ARRAY_AGG(CASE WHEN type_level = 2 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_2_array
FROM
  table

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT object_id,
  ARRAY_AGG(DISTINCT IF(type_level = 0, type_id, NULL) IGNORE NULLS) AS type_level_0_array,
  ARRAY_AGG(DISTINCT IF(type_level = 1, type_id, NULL) IGNORE NULLS) AS type_level_1_array,
  ARRAY_AGG(DISTINCT IF(type_level = 2, type_id, NULL) IGNORE NULLS) AS type_level_2_array
FROM `project.dataset.table`
GROUP BY object_id    

您可以使用您问题中的示例数据来测试和使用上面的内容,如下所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL
  SELECT 1, 23, 1 UNION ALL
  SELECT 1, 24, 1 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 2, 34, 1 UNION ALL
  SELECT 2, 46, 1 UNION ALL
  SELECT 2, 465, 2 UNION ALL
  SELECT 2, 349, 2 UNION ALL
  SELECT 2, 4, 0 UNION ALL
  SELECT 2, 3, 0 )
SELECT object_id,
  ARRAY_AGG(DISTINCT IF(type_level = 0, type_id, NULL) IGNORE NULLS) AS type_level_0_array,
  ARRAY_AGG(DISTINCT IF(type_level = 1, type_id, NULL) IGNORE NULLS) AS type_level_1_array,
  ARRAY_AGG(DISTINCT IF(type_level = 2, type_id, NULL) IGNORE NULLS) AS type_level_2_array
FROM `project.dataset.table`
GROUP BY object_id   

结果

Row     object_id   type_level_0_array  type_level_1_array  type_level_2_array   
1       1           2                   24                  234  
                                        23       
2       2           4                   34                  349  
                    3                   46                  465