BigQuery：连接两个数组并在 MERGE 语句中保留不同的值

Question

我正在处理 MERGE 过程并使用新数据更新数组字段但前提是尚未在数组中找到该值。

target table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   |  [1,2,3] |
| b   |    [0]   |
+-----+----------+

source table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   | [3,4,5] |
| b   |  [0,0]   |
+-----+----------+

target table post-merge
+-----+-------------+
| id  |  arr_col    |
+-----+-------------+
| a   | [1,2,3,4,5] |
| b   |    [0]      |
+-----+-------------+

我试图在我的 MERGE 语句中使用 SQL

merge into target t
using source
  on target.id = source.id
when matched then
update set target.arr_col = array(
                             select distinct x 
                             from unnest(array_concat(target.arr_col, source.arr_col)) x
                            )

但 BigQuery 向我显示以下错误： Correlated Subquery is unsupported in UPDATE clause.

有没有其他方法可以通过 MERGE 更新这个数组字段？目标和源 table 可能非常大，每天运行。所以我希望这是一个增量更新的过程，而不是每次都用新数据重新创建整个 table。

Answer 1

以下适用于 BigQuery 标准 SQL

merge into target
using (
  select id, 
    array(
      select distinct x 
      from unnest(source.arr_col || target.arr_col) as x
      order by x
    ) as arr_col
  from source 
  join target
  using(id)
) source
  on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;

Answer 2

想扩展 Mikhail Berlyant 的回答，因为我的实际应用程序与 OP 略有不同，因为如果不满足合并条件，我还需要插入数据。

merge into target
using (
  select id, 
    array(
      select distinct x 
      from unnest(
          /*  
          concat didn't work without case-when statement for 
          new data (i.e. target.id is null) 
          */
          case when target.id is not null then source.arr_col || target.arr_col 
          else source.arr_col
          end
      ) as x
      order by x
    ) as arr_col
  from source 
  left join target /* to be able to account for brand new data in source */
  using(id)
) source
  on target.id = source.id
when matched then
update set target.arr_col = source.arr_col
when not matched insert row

;

BigQuery：连接两个数组并在 MERGE 语句中保留不同的值

BigQuery: Concatenate two arrays and keep distinct values within MERGE statement

etl

concatenation

array-merge

sql-merge

google-bigquery