Merge DML 扫描的 BigQuery 限制行

BigQuery Limit Rows Scanned by Merge DML

给定下面的 DML 语句,有没有办法限制目标 table 扫描的行数?例如,假设我们有一个字段 shard_id,table 被分区。我事先知道所有更新都应该发生在 shard_id 的某个范围内。有没有办法指定目标的 where 子句来限制需要扫描的行数,这样更新就不必进行完整的 table 扫描来查找 ID?

MERGE dataset.table_target target
USING dataset.table_source source
ON target.id = "123"
WHEN MATCHED THEN
UPDATE SET some_value = source.some_value
WHEN NOT MATCHED BY SOURCE AND id = "123" THEN
DELETE

ON 条件是您需要编写子句的 Where 语句。

ON target.id = "123" AND DATE(t.shard_id) BETWEEN date1 and date2

对于您的情况,按 ON 条件进行分区修剪是不正确的。相反,您应该在 WHEN 子句中执行此操作。

https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables#pruning_partitions_when_using_a_merge_statement.

中有一个针对这种情况的示例

基本上,ON条件被用作MERGE中连接目标表和源表的匹配条件。以下两个查询显示了连接条件和 where 子句之间的区别,

查询 1:

with
t1 as (
  select '2018-01-01' pt, 10 v1 union all
  select '2018-01-01', 20 union all
  select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 and pt = '2018-01-01'

结果:

pt          v1  v2
2018-01-01  10  10
2018-01-01  20  NULL
2000-01-01  10  NULL

查询 2:

with
t1 as (
  select '2018-01-01' pt, 10 v1 union all
  select '2018-01-01', 20 union all
  select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 where pt = '2018-01-01'

结果:

pt          v1  v2
2018-01-01  10  10
2018-01-01  20  NULL