是否可以在 BigQuery 中对嵌套表进行分区？

Is it possible to partition nested tables in BigQuery?

我目前正在将我的数据仓库迁移到 BigQuery。我一直在尝试对数据库进行非规范化，正如我所读到的那样，它可以产生更高效、更便宜的查询。然而，这导致了一些嵌套的 tables。如果每个嵌套的 table 都有一个列“created_at”和“last_modified_at”，有没有什么方法可以使用这些值中的任何一个来划分我的 tables？

不，您不能通过嵌套的 table 对 table 进行分区。根据the docs:

You can partition BigQuery tables by:

Time-unit column: Tables are partitioned based on a TIMESTAMP, DATE, or DATETIME column in the table.

Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data.

Integer range: Tables are partitioned based on an integer column.

此外，分区必须是顶级字段，不能是 RECORD (STRUCT) 的叶字段：

Limitations

You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Time-unit column-partitioned tables are subject to the following limitations:

The partitioning column must be either a scalar DATE, TIMESTAMP, or DATETIME column. While the mode of the column can be REQUIRED or NULLABLE, it cannot be REPEATED (array-based). The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.

Integer-range partitioned tables are subject to the following limitations:

The partitioning column must be an INTEGER column. While the mode of the column may be REQUIRED or NULLABLE, it cannot beREPEATED (array-based). The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.

虽然您可以在 BigQuery 中将更多数据类型与 tables 聚集在一起，但您不能使用 RECORD (STRUCT) 列来聚集 tables：

Clustering columns must be top-level, non-repeated columns of one of the following types:

DATE BOOL GEOGRAPHY INT64 NUMERIC BIGNUMERIC STRING TIMESTAMP DATETIME

如果您进行分区的原因是为了提高 date/time 查询的效率，并且如果每个嵌套的 table 涵盖相似的时间范围，我建议将 table 取消嵌套到parent table。如果您不想取消 table 的嵌套，将另一列添加到您的主 table 中可能会有所帮助，其中包含嵌套 table 中最早或最晚的日期并按新的分区列。

是否可以在 BigQuery 中对嵌套表进行分区？

Is it possible to partition nested tables in BigQuery?

partitioning

denormalization

google-bigquery