BigQuery:连接两个数组并在 MERGE 语句中保留不同的值
BigQuery: Concatenate two arrays and keep distinct values within MERGE statement
我正在处理 MERGE 过程并使用新数据更新数组字段但前提是尚未在数组中找到该值。
target table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [1,2,3] |
| b | [0] |
+-----+----------+
source table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [3,4,5] |
| b | [0,0] |
+-----+----------+
target table post-merge
+-----+-------------+
| id | arr_col |
+-----+-------------+
| a | [1,2,3,4,5] |
| b | [0] |
+-----+-------------+
我试图在我的 MERGE 语句中使用 SQL
merge into target t
using source
on target.id = source.id
when matched then
update set target.arr_col = array(
select distinct x
from unnest(array_concat(target.arr_col, source.arr_col)) x
)
但 BigQuery 向我显示以下错误:
Correlated Subquery is unsupported in UPDATE clause.
有没有其他方法可以通过 MERGE 更新这个数组字段?目标和源 table 可能非常大,每天 运行。所以我希望这是一个增量更新的过程,而不是每次都用新数据重新创建整个 table。
以下适用于 BigQuery 标准 SQL
merge into target
using (
select id,
array(
select distinct x
from unnest(source.arr_col || target.arr_col) as x
order by x
) as arr_col
from source
join target
using(id)
) source
on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;
想扩展 Mikhail Berlyant 的回答,因为我的实际应用程序与 OP 略有不同,因为如果不满足合并条件,我还需要插入数据。
merge into target
using (
select id,
array(
select distinct x
from unnest(
/*
concat didn't work without case-when statement for
new data (i.e. target.id is null)
*/
case when target.id is not null then source.arr_col || target.arr_col
else source.arr_col
end
) as x
order by x
) as arr_col
from source
left join target /* to be able to account for brand new data in source */
using(id)
) source
on target.id = source.id
when matched then
update set target.arr_col = source.arr_col
when not matched insert row
;
我正在处理 MERGE 过程并使用新数据更新数组字段但前提是尚未在数组中找到该值。
target table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [1,2,3] |
| b | [0] |
+-----+----------+
source table
+-----+----------+
| id | arr_col |
+-----+----------+
| a | [3,4,5] |
| b | [0,0] |
+-----+----------+
target table post-merge
+-----+-------------+
| id | arr_col |
+-----+-------------+
| a | [1,2,3,4,5] |
| b | [0] |
+-----+-------------+
我试图在我的 MERGE 语句中使用 SQL
merge into target t
using source
on target.id = source.id
when matched then
update set target.arr_col = array(
select distinct x
from unnest(array_concat(target.arr_col, source.arr_col)) x
)
但 BigQuery 向我显示以下错误:
Correlated Subquery is unsupported in UPDATE clause.
有没有其他方法可以通过 MERGE 更新这个数组字段?目标和源 table 可能非常大,每天 运行。所以我希望这是一个增量更新的过程,而不是每次都用新数据重新创建整个 table。
以下适用于 BigQuery 标准 SQL
merge into target
using (
select id,
array(
select distinct x
from unnest(source.arr_col || target.arr_col) as x
order by x
) as arr_col
from source
join target
using(id)
) source
on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;
想扩展 Mikhail Berlyant 的回答,因为我的实际应用程序与 OP 略有不同,因为如果不满足合并条件,我还需要插入数据。
merge into target
using (
select id,
array(
select distinct x
from unnest(
/*
concat didn't work without case-when statement for
new data (i.e. target.id is null)
*/
case when target.id is not null then source.arr_col || target.arr_col
else source.arr_col
end
) as x
order by x
) as arr_col
from source
left join target /* to be able to account for brand new data in source */
using(id)
) source
on target.id = source.id
when matched then
update set target.arr_col = source.arr_col
when not matched insert row
;